Current Slide
Small screen detected. You are viewing the mobile version of SlideWiki. If you wish to edit slides you will need to use a larger device.
Traditional Distance Measures May Not Be Effective on High-D Data
- Traditional distance measure could be dominated by noises in many dimensions
- Ex. Which pairs of customers are more similar?

- By Euclidean distance, we get,
\[dist(Ada,Bob)=dist(Bob,Cathy)=dist(Ada,Cathy)=\sqrt{2}\]
- despite Ada and Bob look less similar
- Clustering should not only consider dimensions but also attributes (features)
- Feature transformation: effective if most dimensions are relevant (PCA & SVD useful when features are highly correlated/redundant)
- Feature selection: useful to find a subspace where the data have nice clusters
Speaker notes:
Content Tools
Tools
Sources (0)
Tags (0)
Comments (0)
History
Usage
Questions (0)
Playlists (0)
Quality
Sources
There are currently no sources for this slide.