What Are Outliers?

  • Outlier: A data object that deviates significantly from the normal objects as if it were generated by a different mechanism
    • Ex.: Unusual credit card purchase, sports: Michael Jordon, Wayne Gretzky, ...
  • Outliers are different from the noise data
    • Noise is random error or variance in a measured variable
    • Noise should be removed before outlier detection
  • Outliers are interesting: It violates the mechanism that generates the normal data
  • Outlier detection vs. novelty detection: early stage, outlier; but later merged into the model
  • Applications:
    • Credit card fraud detection
    • Telecom fraud detection
    • Customer segmentation
    • Medical analysis

