112031" datareactid="16">
Outlier Detection (2): ProximityBased Methods
 An object is an outlier if the nearest neighbors of the object are far away, i.e., the proximity of the object is significantly deviates from the proximity of most of the other objects in the same data set
 The effectiveness of proximitybased methods highly relies on the proximity measure.
 In some applications, proximity or distance measures cannot be obtained easily.
 Often have a difficulty in finding a group of outliers which stay close to each other
 Two major types of proximitybased outlier detection
 Distancebased vs. densitybased
 Example (right figure): Model the proximity of an object using its 3 nearest neighbors
 Objects in region R are substantially different from other objects in the data set.
 Thus the objects in R are outliers
Outlier Detection (3): ClusteringBased Methods
 Normal data belong to large and dense clusters, whereas outliers belong to small or sparse clusters, or do not belong to any clusters
 Since there are many clustering methods, there are many clusteringbased outlier detection methods as well
 Clustering is expensive: straightforward adaption of a clustering method for outlier detection can be costly and does not scale up well for large data sets
 Example (below figure): two clusters
 All points not in R form a large cluster
 The two points in R form a tiny cluster, thus are outliers
Statistical Approaches
 Statistical approaches assume that the objects in a data set are generated by a stochastic process (a generative model)
 Idea: learn a generative model fitting the given data set, and then identify the objects in low probability regions of the model as outliers
 Methods are divided into two categories: parametric vs. nonparametric
 Parametric method
 Assumes that the normal data is generated by a parametric distribution with parameter θ
 The probability density function of the parametric distribution f(x, θ) gives the probability that object x is generated by the distribution
 The smaller this value, the more likely x is an outlier
 Nonparametric method
 Not assume an apriori statistical model and determine the model from the input data
 Not completely parameter free but consider the number and nature of the parameters are flexible and not fixed in advance
 Examples: histogram and kernel density estimation
Parametric Methods I: Detection Univariate Outliers Based on Normal Distribution
 Univariate data: A data set involving only one attribute or variable
 Often assume that data are generated from a normal distribution, learn the parameters from the input data, and identify the points with low probability as outliers
 Ex: Avg. temp.: {24.0, 28.9, 28.9, 29.0, 29.1, 29.1, 29.2, 29.2, 29.3, 29.4}
 Use the maximum likelihood method to estimate μ and σ
\[lnL(\mu, \sigma^{2} )=\sum_{i=1}^{n}lnf(x_{i}(\mu,
\sigma^{2}))=\frac{n}{2}ln(2\pi)\frac{n}{2}ln(\sigma^{2})\frac{1}{2\sigma^{2}}\sum_{i=1}^{n}(x_{i}\mu)^2
\]
 Taking derivatives with respect to μ and σ2, we derive the following maximum likelihood estimates
\[ \hat{\mu}=\bar{x}=\frac{1}{n}\sum_{i=1}^{n}x_{i} \]
\[ \hat{\sigma}^{2}=\frac{1}{n}\sum_{i=1}^{n}(x_{i}\bar{x})^{2} \]
 For the above data with n = 10, we have
\[ \hat{\mu}=28.61 \]
\[ \hat{\sigma}=\sqrt{2.29}=1.51 \]
 Then (24 – 28.61) /1.51 = – 3.04 < –3, 24 is an outlier since μ±3σ region contains 99.7% data
Parametric Methods I: The Grubb’s Test
 Univariate outlier detection: The Grubb's test (maximum normed residual test) ─ another statistical method under normal distribution
 For each object x in a data set, compute its zscore: x is an outlier if
\[ z\geq \frac{N1}{\sqrt{N}}\sqrt{\frac{t^{2}_{\alpha/(2N),N2 }}{N2+t^{2}_{\alpha/(2N),N2}}} \]
where
\[ t^{2}_{\alpha/(2N),N2 } \]
is the value taken by a tdistribution at a significance level of α/(2N), and N is the # of objects in the data set
Parametric Methods II: Detection of Multivariate Outliers
 Multivariate data: A data set involving two or more attributes or variables
 Transform the multivariate outlier detection task into a univariate outlier detection problem
 Method 1. Compute Mahalaobis distance
 Let ō be the mean vector for a multivariate data set. Mahalaobis distance for an object o to ō is
\[ MDist(o, \bar{o}) = (o\bar{o})^{T} S^{(1)}(o\bar{o}) \]
where S is the covariance matrix  Use the Grubb's test on this measure to detect outliers
 Method 2. Use χ2 –statistic:
\[ X^{2} = \sum_{i=1}^{n}\frac{(o_{i}E_{i})^{2}}{E_{i}} \]
 where Ei is the mean of the idimension among all objects, and n is the dimensionality
 If χ2 –statistic is large, then object oi is an outlier
Parametric Methods III: Using Mixture of Parametric Distributions

Assuming data generated by a normal distribution could be sometimes overly simplified

Example (figure below): The objects between the two clusters cannot be captured as outliers since they are close to the estimated mean

To overcome this problem, assume the normal data is generated by two normal distributions. For any object o in the data set, the probability that o is generated by the mixture of the two distributions is given by
\[ Pr(o\Theta_{1}, \Theta_{2})= f_{\Theta_{1}}(o)+ f_{\Theta_{2}}(o)\]
where fθ1 and fθ2 are the probability density functions of θ1 and θ2

Then use EM algorithm to learn the parameters μ1, σ1, μ2, σ2 from data

An object o is an outlier if it does not belong to any cluster
NonParametric Methods: Detection Using Histogram
 The model of normal data is learned from the input data without any a priori structure.
 Often makes fewer assumptions about the data, and thus can be applicable in more scenarios
 Outlier detection using histogram:
 Figure shows the histogram of purchase amounts in transactions
 A transaction in the amount of $7,500 is an outlier, since only 0.2% transactions have an amount higher than $5,000
Problem: Hard to choose an appropriate bin size for histogram Too small bin size → normal objects in empty/rare bins, false positive
 Too big bin size → outliers in some frequent bins, false negative
Solution: Adopt kernel density estimation to estimate the probability density distribution of the data. If the estimated density function is high, the object is likely normal. Otherwise, it is likely an outlier. ProximityBased Approaches: DistanceBased vs. DensityBased Outlier Detection
 Intuition: Objects that are far away from the others are outliers
 Assumption of proximitybased approach: The proximity of an outlier deviates significantly from that of most of the others in the data set
 Two types of proximitybased outlier detection methods
 Distancebased outlier detection: An object o is an outlier if its neighborhood does not have enough other points
 Densitybased outlier detection: An object o is an outlier if its density is relatively much lower than that of its neighbors
DistanceBased Outlier Detection
 For each object o, examine the # of other objects in the rneighborhood of o, where r is a userspecified distance threshold
 An object o is an outlier if most (taking π as a fraction threshold) of the objects in D are far away from o, i.e., not in the rneighborhood of o
 An object o is a DB(r, π) outlier if
\[ \frac{{o^{'}dist(o,o^{'})\leq r}}{D}\leq \pi \]
 Equivalently, one can check the distance between o and its kth nearest neighbor ok, where
\[ k=\left \lceil \pi D \right \rceil \]
o is an outlier if dist(o, ok) > r  Efficient computation: Nested loop algorithm
 For any object oi, calculate its distance from other objects, and count the # of other objects in the rneighborhood.
 If π∙n other objects are within r distance, terminate the inner loop
 Otherwise, oi is a DB(r, π) outlier
 Efficiency: Actually CPU time is not O(n2) but linear to the data set size since for most nonoutlier objects, the inner loop terminates early
DistanceBased Outlier Detection: A GridBased Method
 Why efficiency is still a concern? When the complete set of objects cannot be held into main memory, cost I/O swapping
 The major cost: (1) each object tests against the whole data set, why not only its close neighbor? (2) check objects one by one, why not group by group?
 Gridbased method (CELL): Data space is partitioned into a multiD grid. Each cell is a hyper cube with diagonal length r/2
 Pruning using the level1 & level 2 cell properties:
 For any possible point x in cell C and any possible point y in a level1 cell, dist(x,y) ≤ r
 For any possible point x in cell C and any point y such that dist(x,y) ≥ r, y is in a level2 cell
 Thus we only need to check the objects that cannot be pruned, and even for such an object o, only need to compute the distance between o and the objects in the level2 cells (since beyond level2, the distance from o is more than r)
DensityBased Outlier Detection
 Local outliers: Outliers comparing to their local neighborhoods, instead of the global data distribution
 In Fig., o1 and o2 are local outliers to C1, o3 is a global outlier, but o4 is not an outlier. However, proximitybased clustering cannot find o1 and o2 are outlier (e.g., comparing with O4).
 Intuition (densitybased outlier detection): The density around an outlier object is significantly different from the density around its neighbors
 Method: Use the relative density of an object against its neighbors as the indicator of the degree of the object being outliers
 kdistance of an object o, distk(o): distance between o and its kth NN
 kdistance neighborhood of o, Nk(o) = {o’ o’ in D, dist(o, o’) ≤ distk(o)}
 Nk(o) could be bigger than k since multiple objects may have identical distance to o
Local Outlier Factor: LOF
 Reachability distance from o’ to o:
\[ reachdist_{k}(o\leftarrow o^{'})=max\left \{ dist_{k}(o),dist(o,o^{'}) \right \} \]
 where k is a userspecified parameter
 Local reachability density of o:
\[ lrdk_{k}(o)=\frac{N_{k}(o)}{\sum_{o^{'}\epsilon N_{k}(o)}reachdist_{k}(o^{'}\leftarrow o)} \]
 LOF (Local outlier factor) of an object o is the average of the ratio of local reachability of o and those of o’s knearest neighbors
\[ LOF_{k}(o)=\frac{\sum_{o^{'}\epsilon
N_{k}(o)}\frac{lrd_{k}(o^{'})}{lrd_{k}(o)}}{N_{k}(o)}=\sum_{o^{'}\epsilon
N_{k}(o)}lrd_{k}(o^{'}).\sum_{o^{'}\epsilon
N_{k}(o)}reachdist_{k}(o^{'}\leftarrow o) \]
 The lower the local reachability density of o, and the higher the local reachability density of the kNN of o, the higher LOF
 This captures a local outlier whose local density is relatively low comparing to the local densities of its kNN
ClusteringBased Outlier Detection (1 & 2):Not belong to any cluster, or far from the closest one
 An object is an outlier if (1) it does not belong to any cluster, (2) there is a large distance between the object and its closest cluster , or (3) it belongs to a small or sparse cluster
 Case I: Not belong to any cluster
 Identify animals not part of a flock: Using a densitybased clustering method such as DBSCAN
 Case 2: Far from its closest cluster
 Using kmeans, partition data points of into clusters
 For each object o, assign an outlier score based on its distance from its closest center
 If dist(o, co)/avg_dist(co) is large, likely an outlier
 Ex. Intrusion detection: Consider the similarity between data points and the clusters in a training data set
 Use a training set to find patterns of “normal” data, e.g., frequent itemsets in each segment, and cluster similar connections into groups
 Compare new data points with the clusters mined—Outliers are possible attacks
ClusteringBased Outlier Detection (3): Detecting Outliers in Small Clusters
 FindCBLOF: Detect outliers in small clusters
 Find clusters, and sort them in decreasing size
 To each data point, assign a clusterbased local outlier factor (CBLOF):
 If obj p belongs to a large cluster, CBLOF = cluster_size X similarity between p and cluster
 If p belongs to a small one, CBLOF = cluster size X similarity betw. p and the closest large cluster
 ClusteringBased Outlier Detection (3): Detecting Outliers in Small Clusters
 Ex. In the figure, o is outlier since its closest large cluster is C1, but the similarity between o and C1 is small. For any point in C3, its closest large cluster is C2 but its similarity from C2 is low, plus C3 = 3 is small
ClusteringBased Method: Strength and Weakness
 Strength
 Detect outliers without requiring any labeled data
 Work for many types of data
 Clusters can be regarded as summaries of the data
 Once the cluster are obtained, need only compare any object against the clusters to determine whether it is an outlier (fast)
 Weakness
 Effectiveness depends highly on the clustering method used—they may not be optimized for outlier detection
 High computational cost: Need to first find clusters
 A method to reduce the cost: Fixedwidth clustering
 A point is assigned to a cluster if the center of the cluster is within a predefined distance threshold from the point
 If a point cannot be assigned to any existing cluster, a new cluster is created and the distance threshold may be learned from the training data under certain conditions
ClassificationBased Method I: OneClass Model
 Idea: Train a classification model that can distinguish “normal” data from outliers
 A bruteforce approach: Consider a training set that contains samples labeled as “normal” and others labeled as “outlier”
 But, the training set is typically heavily biased: # of “normal” samples likely far exceeds # of outlier samples
 Cannot detect unseen anomaly
 Oneclass model: A classifier is built to describe only the normal class.
 Learn the decision boundary of the normal class using classification methods such as SVM
 Any samples that do not belong to the normal class (not within the decision boundary) are declared as outliers
 Adv: can detect new outliers that may not appear close to any outlier objects in the training set
 Extension: Normal objects may belong to multiple classes
ClassificationBased Method II: SemiSupervised Learning
 Semisupervised learning: Combining classificationbased and clusteringbased methods
 Method
 Using a clusteringbased approach, find a large cluster, C, and a small cluster, C1
 Since some objects in C carry the label “normal”, treat all objects in C as normal
 Use the oneclass model of this cluster to identify normal objects in outlier detection
 Since some objects in cluster C1 carry the label “outlier”, declare all objects in C1 as outliers
 Any object that does not fall into the model for C (such as a) is considered an outlier as well
 
 Comments on classificationbased outlier detection methods
 Strength: Outlier detection is fast
 Bottleneck: Quality heavily depends on the availability and quality of the training set, but often difficult to obtain representative and highquality training data
Mining Contextual Outliers I: Transform into Conventional Outlier Detection
 If the contexts can be clearly identified, transform it to conventional outlier detection
 Identify the context of the object using the contextual attributes
 Calculate the outlier score for the object in the context using a conventional outlier detection method
 Ex. Detect outlier customers in the context of customer groups
 Contextual attributes: age group, postal code
 Behavioral attributes: # of trans/yr, annual total trans. amount
 Steps: (1) locate c’s context, (2) compare c with the other customers in the same group, and (3) use a conventional outlier detection method
 If the context contains very few customers, generalize contexts
 Ex. Learn a mixture model U on the contextual attributes, and another mixture model V of the data on the behavior attributes
 Learn a mapping p(ViUj): the probability that a data object o belonging to cluster Uj on the contextual attributes is generated by cluster Vi on the behavior attributes
 Outlier score:
\[ S(o)=\sum_{U_{j}}p(o\epsilon U_{j})\sum_{V_{i}}p(o\epsilon V_{i})p(V_{i}U_{j}) \]
Mining Contextual Outliers II: Modeling Normal Behavior with Respect to Contexts
 In some applications, one cannot clearly partition the data into contexts
 Ex. if a customer suddenly purchased a product that is unrelated to those she recently browsed, it is unclear how many products browsed earlier should be considered as the context
 Model the “normal” behavior with respect to contexts
 Using a training data set, train a model that predicts the expected behavior attribute values with respect to the contextual attribute values
 An object is a contextual outlier if its behavior attribute values significantly deviate from the values predicted by the model
 Using a prediction model that links the contexts and behavior, these methods avoid the explicit identification of specific contexts
 Methods: A number of classification and prediction techniques can be used to build such models, such as regression, Markov Models, and Finite State Automaton
Mining Collective Outliers I: On the Set of “Structured Objects”
 Collective outlier if objects as a group deviate significantly from the entire data
 Need to examine the structure of the data set, i.e, the relationships between multiple data objects
 Each of these structures is inherent to its respective type of data
 For temporal data (such as time series and sequences), we explore the structures formed by time, which occur in segments of the time series or subsequences
 For spatial data, explore local areas
 For graph and network data, we explore subgraphs
 Difference from the contextual outlier detection: the structures are often not explicitly defined, and have to be discovered as part of the outlier detection process.
 Collective outlier detection methods: two categories
 Reduce the problem to conventional outlier detection
 Identify structure units, treat each structure unit (e.g., subsequence, time series segment, local area, or subgraph) as a data object, and extract features
 Then outlier detection on the set of “structured objects” constructed as such using the extracted features
Mining Collective Outliers II: Direct Modeling of the Expected Behavior of Structure Units
 Models the expected behavior of structure units directly
 Ex. 1. Detect collective outliers in online social network of customers
 Treat each possible subgraph of the network as a structure unit
 Collective outlier: An outlier subgraph in the social network
 Small subgraphs that are of very low frequency
 Large subgraphs that are surprisingly frequent
 Ex. 2. Detect collective outliers in temporal sequences
 Learn a Markov model from the sequences
 A subsequence can then be declared as a collective outlier if it significantly deviates from the model
 Collective outlier detection is subtle due to the challenge of exploring the structures in data
 The exploration typically uses heuristics, and thus may be application dependent
 The computational cost is often high due to the sophisticated mining process
Challenges for Outlier Detection in HighDimensional Data
 Interpretation of outliers
 Detecting outliers without saying why they are outliers is not very useful in highD due to many features (or dimensions) are involved in a highdimensional data set
 E.g., which subspaces that manifest the outliers or an assessment regarding the “outlierness” of the objects
 Data sparsity
 Data in highD spaces are often sparse
 The distance between objects becomes heavily dominated by noise as the dimensionality increases
 Data subspaces
 Adaptive to the subspaces signifying the outliers
 Capturing the local behavior of data
 Scalable with respect to dimensionality
 # of subspaces increases exponentially
Approach I: Extending Conventional Outlier Detection
 Method 1: Detect outliers in the full space, e.g., HilOut Algorithm
 Find distancebased outliers, but use the ranks of distance instead of the absolute distance in outlier detection
 For each object o, find its knearest neighbors: nn1(o), . . . , nnk(o)
 The weight of object o:
\[ w(o)=\sum_{i=1}^{k}dist(o,nn_{i}(o)) \]
 All objects are ranked in weightdescending order
 Topl objects in weight are output as outliers (l: userspecified parm)
 Employ spacefilling curves for approximation: scalable in both time and space w.r.t. data size and dimensionality
 Method 2: Dimensionality reduction
 Works only when in lowerdimensionality, normal instances can still be distinguished from outliers
 PCA: Heuristically, the principal components with low variance are preferred because, on such dimensions, normal objects are likely close to each other and outliers often deviate from the majority
Approach II: Finding Outliers in Subspaces
 Extending conventional outlier detection: Hard for outlier interpretation
 Find outliers in much lower dimensional subspaces: easy to interpret why and to what extent the object is an outlier
 E.g., find outlier customers in certain subspace: average transaction amount >> avg. and purchase frequency << avg.
 Ex. A gridbased subspace outlier detection method
 Project data onto various subspaces to find an area whose density is much lower than average
 Discretize the data into a grid with φ equidepth (why?) regions
 Search for regions that are significantly sparse
 Consider a kd cube: k ranges on k dimensions, with n objects
 If objects are independently distributed, the expected number of objects falling into a kdimensional region is (1/ φ)kn = fkn,the standard deviation is
\[ \sqrt{f^{k}(1f^{k})n} \]
 The sparsity coefficient of cube C:
\[ S(C)=\frac{n(C)f^{k}n}{\sqrt{f^{k}(1f^{k})n}} \]
 If S(C) < 0, C contains less objects than expected
 The more negative, the sparser C is and the more likely the objects in C are outliers in the subspace
Approach III: Modeling HighDimensional Outliers
 Ex. Anglebased outliers: Kriegel, Schubert, and Zimek [KSZ08]
 For each point o, examine the angle ∆xoy for every pair of points x, y.
 Point in the center (e.g., a), the angles formed differ widely
 An outlier (e.g., c), angle variable is substantially smaller
 Use the variance of angles for a point to determine outlier
 Combine angles and distance to model outliers
 Use the distanceweighted angle variance as the outlier score
 Anglebased outlier factor (ABOF):
\[ ABOF(o)=VAR_{x,y\epsilon D,x\neq o,y\neq o}\frac{<\vec{ox},\vec{oy} > }{dist(o,x)^2dist(o,y)^2} \]
 Efficient approximation computation method is developed
 It can be generalized to handle arbitrary types of data
 Develop new models for highdimensional outliers directly
 Avoid proximity measures and adopt new heuristics that do not deteriorate in highdimensional data
Outlier Discovery: Statistical Approaches
 Assume a model underlying distribution that generates data set (e.g. normal distribution)
 Use discordancy tests depending on
 data distribution
 distribution parameter (e.g., mean, variance)
 number of expected outliers
 Drawbacks
 most tests are for single attribute
 In many cases, data distribution may not be known
Outlier Discovery: DistanceBased Approach
 Introduced to counter the main limitations imposed by statistical methods
 We need multidimensional analysis without knowing data distribution
 Distancebased outlier: A DB(p, D)outlier is an object O in a dataset T such that at least a fraction p of the objects in T lies at a distance greater than D from O
 Algorithms for mining distancebased outliers [Knorr & Ng, VLDB’98]
 Indexbased algorithm
 Nestedloop algorithm
 Cellbased algorithm
DensityBased Local Outlier Detection
 M. M. Breunig, H.P. Kriegel, R. Ng, J. Sander. LOF: Identifying DensityBased Local Outliers. SIGMOD 2000.
 Distancebased outlier detection is based on global distance distribution
 It encounters difficulties to identify outliers if data is not uniformly distributed
 Ex. C1 contains 400 loosely distributed points, C2 has 100 tightly condensed points, 2 outlier points o1, o2
 Distancebased method cannot identify o2 as an outlier
 Need the concept of local outlier
 Local outlier factor (LOF)
 Assume outlier is not crisp
 Each point has a LOF
Outlier Discovery: DeviationBased Approach
 Identifies outliers by examining the main characteristics of objects in a group
 Objects that “deviate” from this description are considered outliers
 Sequential exception technique
 simulates the way in which humans can distinguish unusual objects from among a series of supposedly like objects
 OLAP data cube technique
 uses data cubes to identify regions of anomalies in large multidimensional data
Summary
 Types of outliers
 global, contextual & collective outliers
 Outlier detection
 supervised, semisupervised, or unsupervised
 Statistical (or modelbased) approaches
 Proximitybase approaches
 Clusteringbase approaches
 Classification approaches
 Mining contextual and collective outliers
 Outlier detection in high dimensional data
References
 B. Abraham and G.E.P. Box. Bayesian analysis of some outlier problems in time series. Biometrika, 66:229–248, 1979.
 M. Agyemang, K. Barker, and R. Alhajj. A comprehensive survey of numeric and symbolic outlier mining techniques. Intell. Data Anal., 10:521–538, 2006.
 F. J. Anscombe and I. Guttman. Rejection of outliers. Technometrics, 2:123–147, 1960.
 D. Agarwal. Detecting anomalies in crossclassified streams: a bayesian approach. Knowl. Inf. Syst., 11:29–44, 2006.
 F. Angiulli and C. Pizzuti. Outlier mining in large highdimensional data sets. TKDE, 2005.
 C. C. Aggarwal and P. S. Yu. Outlier detection for high dimensional data. SIGMOD’01
 R.J. Beckman and R.D. Cook. Outlier...s. Technometrics, 25:119–149, 1983.
 I. BenGal. Outlier detection. In Maimon O. and Rockach L. (eds.) Data Mining and Knowledge Discovery Handbook: A Complete Guide for Practitioners and Researchers, Kluwer Academic, 2005.
 M. M. Breunig, H.P. Kriegel, R. Ng, and J. Sander. LOF: Identifying densitybased local outliers. SIGMOD’00
 D. Barbar´a, Y. Li, J. Couto, J.L. Lin, and S. Jajodia. Bootstrapping a data mining intrusion detection system. SAC’03
 Z. A. Bakar, R. Mohemad, A. Ahmad, and M. M. Deris. A comparative study for outlier detection techniques in data mining. IEEE Conf. on Cybernetics and Intelligent Systems, 2006.
 S. D. Bay and M. Schwabacher. Mining distancebased outliers in near linear time with randomization and a simple pruning rule. KDD’03
 D. Barbara, N. Wu, and S. Jajodia. Detecting novel network intrusion using bayesian estimators. SDM’01
 V. Chandola, A. Banerjee, and V. Kumar. Anomaly detection: A survey. ACM Computing Surveys, 41:1–58, 2009.
 D. Dasgupta and N.S. Majumdar. Anomaly detection in multidimensional data using negative selection algorithm. In CEC’02
References (cont')
 E. Eskin, A. Arnold, M. Prerau, L. Portnoy, and S. Stolfo. A geometric framework for unsupervised anomaly detection: Detecting intrusions in unlabeled data. In Proc. 2002 Int. Conf. of Data Mining for Security Applications, 2002.
 E. Eskin. Anomaly detection over noisy data using learned probability distributions. ICML’00
 T. Fawcett and F. Provost. Adaptive fraud detection. Data Mining and Knowledge Discovery, 1:291–316, 1997.
 V. J. Hodge and J. Austin. A survey of outlier detection methdologies. Artif. Intell. Rev., 22:85–126, 2004.
 D. M. Hawkins. Identification of Outliers. Chapman and Hall, London, 1980.
 Z. He, X. Xu, and S. Deng. Discovering clusterbased local outliers. Pattern Recogn. Lett., 24, June, 2003.
 W. Jin, K. H. Tung, and J. Han. Mining topn local outliers in large databases. KDD’01
 W. Jin, A. K. H. Tung, J. Han, and W. Wang. Ranking outliers using symmetric neighborhood relationship. PAKDD’06
 E. Knorr and R. Ng. A unified notion of outliers: Properties and computation. KDD’97
 E. Knorr and R. Ng. Algorithms for mining distancebased outliers in large datasets. VLDB’98
 E. M. Knorr, R. T. Ng, and V. Tucakov. Distancebased outliers: Algorithms and applications. VLDB J., 8:237–253, 2000.
 H.P. Kriegel, M. Schubert, and A. Zimek. Anglebased outlier detection in highdimensional data. KDD’08
 M. Markou and S. Singh. Novelty detection: A review—part 1: Statistical approaches. Signal Process., 83:2481–2497, 2003.
 M. Markou and S. Singh. Novelty detection: A review—part 2: Neural network based approaches. Signal Process., 83:2499–2521, 2003.
 C. C. Noble and D. J. Cook. Graphbased anomaly detection. KDD’03
References (cont')
 S. Papadimitriou, H. Kitagawa, P. B. Gibbons, and C. Faloutsos. Loci: Fast outlier detection using the local correlation integral. ICDE’03
 A. Patcha and J.M. Park. An overview of anomaly detection techniques: Existing solutions and latest technological trends. Comput. Netw., 51, 2007.
 X. Song, M. Wu, C. Jermaine, and S. Ranka. Conditional anomaly detection. IEEE Trans. on Knowl. and Data Eng., 19, 2007.
 Y. Tao, X. Xiao, and S. Zhou. Mining distancebased outliers from large databases in any metric space. KDD’06
 N. Ye and Q. Chen. An anomaly detection technique based on a chisquare statistic for detecting intrusions into information systems. Quality and Reliability Engineering International, 17:105–112, 2001.
 B.K. Yi, N. Sidiropoulos, T. Johnson, H. V. Jagadish, C. Faloutsos, and A. Biliris. Online data mining for coevolving time sequences. ICDE’00
References (cont')
 B. Abraham and G.E.P. Box. Bayesian analysis of some outlier problems in time series. Biometrika, 1979.
 Malik Agyemang, Ken Barker, and Rada Alhajj. A comprehensive survey of numeric and symbolic outlier mining techniques. Intell. Data Anal., 2006.
 Deepak Agarwal. Detecting anomalies in crossclassied streams: a bayesian approach. Knowl. Inf. Syst., 2006.
 C. C. Aggarwal and P. S. Yu. Outlier detection for high dimensional data. SIGMOD'01.
 M. M. Breunig, H.P. Kriegel, R. T. Ng, and J. Sander. Opticsof: Identifying local outliers. PKDD '99
 M. M. Breunig, H.P. Kriegel, R. Ng, and J. Sander. LOF: Identifying densitybased local outliers. SIGMOD'00.
 V. Chandola, A. Banerjee, and V. Kumar. Anomaly detection: A survey. ACM Comput. Surv., 2009.
 D. Dasgupta and N.S. Majumdar. Anomaly detection in multidimensional data using negative selection algorithm. Computational Intelligence, 2002.
 E. Eskin, A. Arnold, M. Prerau, L. Portnoy, and S. Stolfo. A geometric framework for unsupervised anomaly detection: Detecting intrusions in unlabeled data. In Proc. 2002 Int. Conf. of Data Mining for Security Applications, 2002.
 E. Eskin. Anomaly detection over noisy data using learned probability distributions. ICML’00.
 T. Fawcett and F. Provost. Adaptive fraud detection. Data Mining and Knowledge Discovery, 1997.
 R. Fujimaki, T. Yairi, and K. Machida. An approach to spacecraft anomaly detection problem using kernel feature space. KDD '05
 F. E. Grubbs. Procedures for detecting outlying observations in samples. Technometrics, 1969.
References (cont')
 V. Hodge and J. Austin. A survey of outlier detection methodologies. Artif. Intell. Rev., 2004.
 Douglas M Hawkins. Identification of Outliers. Chapman and Hall, 1980.
 P. S. Horn, L. Feng, Y. Li, and A. J. Pesce. Effect of Outliers and Nonhealthy Individuals on Reference Interval Estimation. Clin Chem, 2001.
 W. Jin, A. K. H. Tung, J. Han, and W. Wang. Ranking outliers using symmetric neighborhood relationship. PAKDD'06
 E. Knorr and R. Ng. Algorithms for mining distancebased outliers in large datasets. VLDB’98
 M. Markou and S. Singh.. Novelty detection: a review part 1: statistical approaches. Signal Process., 83(12), 2003.
 M. Markou and S. Singh. Novelty detection: a review part 2: neural network based approaches. Signal Process., 83(12), 2003.
 S. Papadimitriou, H. Kitagawa, P. B. Gibbons, and C. Faloutsos. Loci: Fast outlier detection using the local correlation integral. ICDE'03.
 A. Patcha and J.M. Park. An overview of anomaly detection techniques: Existing solutions and latest technological trends. Comput. Netw., 51(12):3448{3470, 2007.
 W. Stefansky. Rejecting outliers in factorial designs. Technometrics, 14(2):469{479, 1972.
 X. Song, M. Wu, C. Jermaine, and S. Ranka. Conditional anomaly detection. IEEE Trans. on Knowl. and Data Eng., 19(5):631{645, 2007.
 Y. Tao, X. Xiao, and S. Zhou. Mining distancebased outliers from large databases in any metric space. KDD '06:
 N. Ye and Q. Chen. An anomaly detection technique based on a chisquare statistic for detecting intrusions into information systems. Quality and Reliability Engineering International, 2001.