Current Slide

Small screen detected. You are viewing the mobile version of SlideWiki. If you wish to edit slides you will need to use a larger device.

Sampling

  • Sampling: obtaining a small sample s to represent the whole data set N
  • Allow a mining algorithm to run in complexity that is potentially sub-linear to the size of the data
  • Key principle: Choose a representative subset of the data
    • Simple random sampling may have very poor performance in the presence of skew
    • Develop adaptive sampling methods, e.g., stratified sampling:
  • Note: Sampling may not reduce database I/Os (page at a time)

Speaker notes:

Content Tools

Sources

There are currently no sources for this slide.