Model-Based Clustering

  • A set C of k probabilistic clusters C1, …,Ck with probability density functions f1, …, fk, respectively, and their probabilities ω1, …, ωk.
  • Probability of an object o generated by cluster Cj is


  • Probability of o generated by the set of cluster C is


  • Since objects are assumed to be generated independently, for a data set D = {o1, …, on}, we have,


  • Task: Find a set C of k probabilistic clusters s.t. P(D|C) is maximized
  • However, maximizing P(D|C) is often intractable since the probability density function of a cluster can take an arbitrarily complicated form
  • To make it computationally feasible (as a compromise), assume the probability density functions being some parameterized distributions

