Measuring the Dispersion of Data

  • Quartiles, outliers and boxplots
    • Quartiles: Q1 (25th percentile), Q3 (75th percentile)
    • Inter-quartile range: IQR = Q3 – Q1
    • Five number summary: min, Q1, median, Q3, max
    • Boxplot: ends of the box are the quartiles; median is marked; add whiskers, and plot outliers individually
    • Outlier: usually, a value higher/lower than 1.5 x IQR
  • Variance and standard deviation (sample: s, population: σ)
    • Variance: (algebraic, scalable computation)
\[ {s^{2}}=\frac{1}{n-1}\sum_{i=1}^{n}(x_{i}-\bar{x})^2=\frac{1}{n-1}[\sum_{i=1}^{n}x_{i}^2-\frac{1}{n}(\sum_{i=1}^{n}x_{i})^2] \]

\[ {\sigma ^{2}}=\frac{1}{N}\sum_{i=1}^{n}(x_{i}-\mu)^2=\frac{1}{N}\sum_{i=1}^{n}x_{i}^2-\mu ^2 \]
    • Standard deviation s (or σ) is the square root of variance s2 (or σ2)  

