Naïve Bayes Classifier

  • A simplified assumption: attributes are conditionally independent (i.e., no dependence relation between attributes):

\[P(X|C_{i})=\prod_{k=1}^{n}P(x_{k}|C_{i})=P(x_{1}|C_{i})\times P(x_{2}|C_{i})\times ... \times P(x_{n}|C_{i})\]

  • This greatly reduces the computation cost: Only counts the class distribution
  • If Ak is categorical, P(xk|Ci) is the # of tuples in Ci having value xk for Ak divided by |Ci, D| (# of tuples of Ci in D)
  • If Ak is continous-valued, P(xk|Ci) is usually computed based on Gaussian distribution with a mean μ and standard deviation σ

\[g(x,\mu,\sigma)=\frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(x-\mu)^{2}}{2\sigma ^{2}}}\]

    • and P(xk|Ci) is 


