Current Slide

Small screen detected. You are viewing the mobile version of SlideWiki. If you wish to edit slides you will need to use a larger device.

Parametric Methods I: Detection Univariate Outliers Based on Normal Distribution

  • Univariate data: A data set involving only one attribute or variable
  • Often assume that data are generated from a normal distribution, learn the parameters from the input data, and identify the points with low probability as outliers
  • Ex: Avg. temp.: {24.0, 28.9, 28.9, 29.0, 29.1, 29.1, 29.2, 29.2, 29.3, 29.4}
    • Use the maximum likelihood method to estimate μ and σ
\[lnL(\mu, \sigma^{2} )=\sum_{i=1}^{n}lnf(x_{i}|(\mu, \sigma^{2}))=-\frac{n}{2}ln(2\pi)-\frac{n}{2}ln(\sigma^{2})-\frac{1}{2\sigma^{2}}\sum_{i=1}^{n}(x_{i}-\mu)^2 \]
  • Taking derivatives with respect to μ and σ2, we derive the following maximum likelihood estimates
\[ \hat{\mu}=\bar{x}=\frac{1}{n}\sum_{i=1}^{n}x_{i} \]
\[ \hat{\sigma}^{2}=\frac{1}{n}\sum_{i=1}^{n}(x_{i}-\bar{x})^{2} \]
  • For the above data with n = 10, we have

 \[ \hat{\mu}=28.61 \]

 \[ \hat{\sigma}=\sqrt{2.29}=1.51 \]

  • Then (24 – 28.61) /1.51 = – 3.04 < –3, 24 is an outlier since μ±3σ region contains 99.7% data

Speaker notes:

Content Tools

Sources

There are currently no sources for this slide.