Current Slide

Small screen detected. You are viewing the mobile version of SlideWiki. If you wish to edit slides you will need to use a larger device.

Gini Index (CART, IBM IntelligentMiner)

  • If a data set D contains examples from n classes, gini index, gini(D) is defined as

\[gini(D)=1-\sum_{j=1}^{n}p^{2}j\]

  • where pj is the relative frequency of class j in D
  • If a data set D is split on A into two subsets D1 and D2, the gini index gini(D) is defined as

\[gini_{A}(D)=\frac{|D_{1}|}{|D|}gini(D_{1})+\frac{|D_{2}|}{|D|} gini(D_{1})\]

  • Reduction in Impurity:

\[\Delta gini(A)=gini(D)-gini_{A}(D)\]

  • The attribute provides the smallest ginisplit(D) (or the largest reduction in impurity) is chosen to split the node (need to enumerate all the possible splitting points for each attribute)

Speaker notes:

Content Tools

Sources

There are currently no sources for this slide.