Current Slide

Small screen detected. You are viewing the mobile version of SlideWiki. If you wish to edit slides you will need to use a larger device.

Non-Parametric Methods: Detection Using Histogram

  • The model of normal data is learned from the input data without any a priori structure.

  • Often makes fewer assumptions about the data, and thus can be applicable in more scenarios
  • Outlier detection using histogram:
  • Figure shows the histogram of purchase amounts in transactions
  • A transaction in the amount of $7,500 is an outlier, since only 0.2% transactions have an amount higher than $5,000
  • Problem: Hard to choose an appropriate bin size for histogram
    • Too small bin size → normal objects in empty/rare bins, false positive
    • Too big bin size → outliers in some frequent bins, false negative
  • Solution: Adopt kernel density estimation to estimate the probability density distribution of the data. If the estimated density function is high, the object is likely normal. Otherwise, it is likely an outlier.

  • Speaker notes:

    Content Tools

    Sources

    There are currently no sources for this slide.