Search code examples
algorithmfilterfilteringheuristics

Anomaly Detection Algorithms


I am tasked with detecting anomalies (known or unknown) using machine-learning algorithms from data in various formats - e.g. emails, IMs etc.

  1. What are your favorite and most effective anomaly detection algorithms?

  2. What are their limitations and sweet-spots?

  3. How would you recommend those limitations be addressed?

All suggestions very much appreciated.


Solution

  • Statistical filters like Bayesian filters or some bastardised version employed by some spam filters are easy to implement. Plus there are lots of online documentation about it.

    The big downside is that it cannot really detect unknown things. You train it with a large sample of known data so that it can categorize new incoming data. But you can turn the traditional spam filter upside down: train it to recognize legitimate data instead of illegitimate data so that anything it doesn't recognize is an anomaly.