Search code examples
azureterraformterraform-provider-azureappinsights

Difference between trigger threshold and metric_trigger threshold in terraform


I was adding metric alerts to monitor a web api hosted in AKS and was looking azurerm_monitor_scheduled_query_rules_alert here. I could not tell the difference between the two thresholds. I am confused about the purpose and where each one applies ?

trigger {
    operator  = "GreaterThan"
    threshold = 3
    metric_trigger {
      operator            = "GreaterThan"
      threshold           = 1
      metric_trigger_type = "Total"
      metric_column       = "operation_Name"
    }
  }

Solution

  • We tested with trial and error and found that metric_trigger's threshold maps to "minFailingPeriodsToAlert". AKA "how many times the threshold must be exceeded for the alert to fire".

    We applied an alert with a trigger like this:

    trigger {
        operator  = "GreaterThan"
        threshold = 3
        metric_trigger {
            metric_trigger_type = "Total"
            operator = "GreaterThanOrEqual"
            threshold = 100
            metric_column = "fileCount"
        }
    }
    

    and it created this resource in Azure

    "criteria": {
        "allOf": [
            {
                "query": "customEvents | where parsedStatus != \"RUNNING\" and parsedStatus != \"SUCCESS\" ",
                "timeAggregation": "Average",
                "metricMeasureColumn": "AggregatedValue",
                "dimensions": [
                    {
                        "name": "itemCount",
                        "operator": "Include",
                        "values": [
                            "*"
                        ]
                    }
                ],
                "operator": "GreaterThan",
                "threshold": 3,
                "failingPeriods": {
                    "numberOfEvaluationPeriods": null,
                    "minFailingPeriodsToAlert": 100
                }
            }
        ]
    },
    

    In the end we set metric_trigger's threshold to 0 and use the "regular" threshold for configuring our alert. I might use this setting if our application got bursts of traffic - it would let the server process the backlog of files without firing the alert immediately.