Current Slide

Small screen detected. You are viewing the mobile version of SlideWiki. If you wish to edit slides you will need to use a larger device.

Traditional Distance Measures May Not Be Effective on High-D Data

  • Traditional distance measure could be dominated by noises in many dimensions
  • Ex. Which pairs of customers are more similar?
  • By Euclidean distance, we get, 

\[dist(Ada,Bob)=dist(Bob,Cathy)=dist(Ada,Cathy)=\sqrt{2}\]

    • despite Ada and Bob look less similar
  • Clustering should not only consider dimensions but also attributes (features)
    • Feature transformation: effective if most dimensions are relevant (PCA & SVD useful when features are highly correlated/redundant)
    • Feature selection: useful to find a subspace where the data have nice clusters


Speaker notes:

Content Tools

Sources

There are currently no sources for this slide.