How to Handle Noisy Data?

  • Binning
    • first sort data and partition into (equal-frequency) bins
    • then one can smooth by bin means, smooth by bin median, smooth by bin boundaries, etc.
  • Regression
    • smooth by fitting the data into regression functions
  • Clustering
    • detect and remove outliers
  • Combined computer and human inspection
    • detect suspicious values and check by human (e.g., deal with possible outliers)

