Small screen detected. You are viewing the mobile version of SlideWiki. If you wish to edit slides you will need to use a larger device.
Non-Parametric Methods: Detection Using Histogram
The model of normal data is learned from the input data without any a priori structure.
Often makes fewer assumptions about the data, and thus can be applicable in more scenarios
Outlier detection using histogram:
Figure shows the histogram of purchase amounts in transactions
A transaction in the amount of $7,500 is an outlier, since only 0.2% transactions have an amount higher than $5,000
Problem: Hard to choose an appropriate bin size for histogram
Too small bin size → normal objects in empty/rare bins, false positive
Too big bin size → outliers in some frequent bins, false negative
Solution: Adopt kernel density estimation to estimate the probability density distribution of the data. If the estimated density function is high, the object is likely normal. Otherwise, it is likely an outlier.