Search code examples
machine-learningcluster-analysisanomaly-detection

anomaly detection with clustering?


One of anomaly detection algorithms is to use multivariate Gaussian to construct a probability density, according to Andrew Ng's coursera lecture.

What if data show cluster structures (not a single chunk)? In this case do we resort to unsupervised clustering to construct the density? If yes, how to do it? Are there other systematic ways to discover if such a case exists?


Solution

  • You can just use regular GMM and use a threshold on the likelihood to identify outliers. Points that don't fit the model well are outliers.

    This works okay as long as your data really is composed of Gaussians.

    Furthermore, clustering is fairly expensive. Usually it will be faster to directly use a nonparametric outlier model like KNN or LOF or LOOP.