Feature Selection: How?

  • Two ideas:

    • Hypothesis testing statistics:

      • Are we confident that the value of one categorical variable is associated with the value of another

      • Chi-square test (χ2)

    • Information theory:

      • How much information does the value of one categorical variable give you about the value of another

      • Mutual information

  • They’re similar, but χ2 measures confidence in association, (based on available statistics), while MI measures extent of association (assuming perfect knowledge of probabilities)

