Attribute Subset Selection

  • Another way to reduce dimensionality of data
  • Redundant attributes
    • Duplicate much or all of the information contained in one or more other attributes
    • E.g., purchase price of a product and the amount of sales tax paid
  • Irrelevant attributes
    • Contain no information that is useful for the data mining task at hand
    • E.g., students' ID is often irrelevant to the task of predicting students' GPA

