Current Slide
Small screen detected. You are viewing the mobile version of SlideWiki. If you wish to edit slides you will need to use a larger device.
Distance-Based Outlier Detection
- For each object o, examine the # of other objects in the r-neighborhood of o, where r is a user-specified distance threshold
- An object o is an outlier if most (taking π as a fraction threshold) of the objects in D are far away from o, i.e., not in the r-neighborhood of o
- An object o is a DB(r, π) outlier if
\[ \frac{||{o^{'}|dist(o,o^{'})\leq r}||}{||D||}\leq \pi \]
- Equivalently, one can check the distance between o and its k-th nearest neighbor ok, where
\[ k=\left \lceil \pi ||D|| \right \rceil \]
o is an outlier if dist(o, ok) > r - Efficient computation: Nested loop algorithm
- For any object oi, calculate its distance from other objects, and count the # of other objects in the r-neighborhood.
- If π∙n other objects are within r distance, terminate the inner loop
- Otherwise, oi is a DB(r, π) outlier
- Efficiency: Actually CPU time is not O(n2) but linear to the data set size since for most non-outlier objects, the inner loop terminates early
Speaker notes:
Content Tools
Tools
Sources (0)
Tags (0)
Comments (0)
History
Usage
Questions (0)
Playlists (0)
Quality
Sources
There are currently no sources for this slide.