Current Slide

Small screen detected. You are viewing the mobile version of SlideWiki. If you wish to edit slides you will need to use a larger device.

Approach II: Finding Outliers in Subspaces

  • Extending conventional outlier detection: Hard for outlier interpretation
  • Find outliers in much lower dimensional subspaces: easy to interpret why and to what extent the object is an outlier
    • E.g., find outlier customers in certain subspace: average transaction amount >> avg. and purchase frequency << avg.
  • Ex. A grid-based subspace outlier detection method
    • Project data onto various subspaces to find an area whose density is much lower than average
    • Discretize the data into a grid with φ equi-depth (why?) regions
    • Search for regions that are significantly sparse
      • Consider a k-d cube: k ranges on k dimensions, with n objects
      • If objects are independently distributed, the expected number of objects falling into a k-dimensional region is (1/ φ)kn = fkn,the standard deviation is

\[ \sqrt{f^{k}(1-f^{k})n} \]

      • The sparsity coefficient of cube C:

\[ S(C)=\frac{n(C)-f^{k}n}{\sqrt{f^{k}(1-f^{k})n}} \]

      • If S(C) < 0, C contains less objects than expected
      • The more negative, the sparser C is and the more likely the objects in C are outliers in the subspace

Speaker notes:

Content Tools

Sources

There are currently no sources for this slide.