Outlier Discovery: Distance-Based Approach

  • Introduced to counter the main limitations imposed by statistical methods
    • We need multi-dimensional analysis without knowing data distribution
  • Distance-based outlier: A DB(p, D)-outlier is an object O in a dataset T such that at least a fraction p of the objects in T lies at a distance greater than D from O
  • Algorithms for mining distance-based outliers [Knorr & Ng, VLDB’98]
    • Index-based algorithm
    • Nested-loop algorithm
    • Cell-based algorithm

