Search code examples
anomaly-detectionazure-anomaly-detectionazure-anomaly-detector

When to use “/entire” vs “/last” API in Azure Anomaly Detector?


As I went throught the docs of Anomaly Detector APIs, I've found there're two APIs (or "modes") : /last and /entire. The doc says they're streaming versus batch modes. However, I don't think the message is super clear and the two APIs/modes seem have really similar functionality. I have some IoT data from sensors on the factory floor; I could preprocess the data to ensure it meets the API requirements; I could code with C# in my app. Could anyone help elaborate how to choose the better API to use for my scenario?

I have tried both APIs in the Azure notebooks


Solution

  • Thanks for using Anomaly Detector.

    The Anomaly Detector API's batch detection endpoint lets you detect anomalies through your entire times series data. In this detection mode, a single statistical model is created and applied to each point in the data set. If your time series has the below characteristics, we recommend using batch detection to preview your data in one API call.

    1. A seasonal time series, with occasional anomalies.
    2. A flat trend time series, with occasional spikes/dips.

    We don't recommend using batch anomaly detection for real-time data monitoring, or using it on time series data that doesn't have above characteristics.

    1. Batch detection creates and applies only one model, the detection for each point is done in the context of whole series.

      If the time series data trends up and down without seasonality, some points of change (dips and spikes in the data) may be missed by the model. Similarly, some points of change that are less significant than ones later in the data set may not be counted as significant enough to be incorporated into the model.

    2. Batch detection is slower than detecting the anomaly status of the latest point when doing real-time data monitoring, because of the number of points being analyzed.

    For real-time data monitoring, we recommend detecting the anomaly status of your latest data point only. By continuously applying latest point detection, streaming data monitoring can be done more efficiently and accurately.

    The example below describes the impact these detection modes can have on performance. The first picture shows the result of continuously detecting the anomaly status latest point along 28 previously seen data points. The red points are anomalies.

    An image showing anomaly detection using the latest point

    Below is the same data set using batch anomaly detection. The model built for the operation has ignored several anomalies, marked by rectangles.

    An image showing anomaly detection using the batch method

    Thanks again, we will add the info into public documentation of AD service.