Current Slide

Small screen detected. You are viewing the mobile version of SlideWiki. If you wish to edit slides you will need to use a larger device.

Distance-Based Outlier Detection

 
  • For each object o, examine the # of other objects in the r-neighborhood of o, where r is a user-specified distance threshold
  • An object o is an outlier if most (taking π as a fraction threshold) of the objects in D are far away from o, i.e., not in the r-neighborhood of o
  • An object o is a DB(r, π) outlier if

\[ \frac{||{o^{'}|dist(o,o^{'})\leq r}||}{||D||}\leq \pi \]

  • Equivalently, one can check the distance between o and its k-th nearest neighbor ok, where
    \[ k=\left \lceil \pi ||D|| \right \rceil \]
    o is an outlier if dist(o, ok) > r
  • Efficient computation: Nested loop algorithm
    • For any object oi, calculate its distance from other objects, and count the # of other objects in the r-neighborhood.
    • If π∙n other objects are within r distance, terminate the inner loop
    • Otherwise, oi is a DB(r, π) outlier
  • Efficiency: Actually CPU time is not O(n2) but linear to the data set size since for most non-outlier objects, the inner loop terminates early

Speaker notes:

Content Tools

Sources

There are currently no sources for this slide.