Current Slide
Small screen detected. You are viewing the mobile version of SlideWiki. If you wish to edit slides you will need to use a larger device.
Computation of Gini Index
- Ex. D has 9 tuples in buys_computer = “yes” and 5 in “no”
\[gini(D)=1-\left ( \frac{9}{14} \right )^{2}-\left ( \frac{5}{14} \right )^{2}=0.459\]
- Suppose the attribute income partitions D into 10 in D1: {low, medium} and 4 in D2
\[gini_{income\epsilon \left \{ low,medium \right \}}(D)=\left ( \frac{10}{14} \right )Gini(D_{1})+\left ( \frac{4}{14} \right )Gini(D_{2})\]
\[\frac{10}{14}\left (1-\left ( \frac{7}{10} \right )^{2}-\left ( \frac{3}{10} \right )^{2} \right )+\frac{4}{14}\left (1-\left ( \frac{2}{4} \right )^{2}-\left ( \frac{2}{4} \right )^{2} \right )\]
\[=0.443=gini_{income\epsilon \left \{ high \right \}}(D)\]
- Gini{low,high} is 0.458; Gini{medium,high} is 0.450. Thus, split on the {low,medium} (and {high}) since it has the lowest Gini index
- All attributes are assumed continuous-valued
- May need other tools, e.g., clustering, to get the possible split values
- Can be modified for categorical attributes
Speaker notes:
Content Tools
Tools
Sources (0)
Tags (0)
Comments (0)
History
Usage
Questions (0)
Playlists (0)
Quality
Sources
There are currently no sources for this slide.