Current Slide

Small screen detected. You are viewing the mobile version of SlideWiki. If you wish to edit slides you will need to use a larger device.

Computation of Gini Index

  • Ex. D has 9 tuples in buys_computer = “yes” and 5 in “no”

\[gini(D)=1-\left ( \frac{9}{14} \right )^{2}-\left ( \frac{5}{14} \right )^{2}=0.459\]

  • Suppose the attribute income partitions D into 10 in D1: {low, medium} and 4 in D2

\[gini_{income\epsilon \left \{ low,medium \right \}}(D)=\left ( \frac{10}{14} \right )Gini(D_{1})+\left ( \frac{4}{14} \right )Gini(D_{2})\]

\[\frac{10}{14}\left (1-\left ( \frac{7}{10} \right )^{2}-\left ( \frac{3}{10} \right )^{2}  \right )+\frac{4}{14}\left (1-\left ( \frac{2}{4} \right )^{2}-\left ( \frac{2}{4} \right )^{2}  \right )\]

\[=0.443=gini_{income\epsilon \left \{ high \right \}}(D)\]

    • Gini{low,high} is 0.458; Gini{medium,high} is 0.450. Thus, split on the {low,medium} (and {high}) since it has the lowest Gini index
  • All attributes are assumed continuous-valued
  • May need other tools, e.g., clustering, to get the possible split values
  • Can be modified for categorical attributes

Speaker notes:

Content Tools

Sources

There are currently no sources for this slide.