Search code examples
unit-testingprometheusalert

Unit Testing Prometheus Alerts: input series and interval


I have been writing unit tests for my Prometheus alerts and I have just increased the interval range in my alert, therefore I need to modify my current test. This is my modified test:

  - interval: 15m
    # Series data.
    input_series:
      - series: 'some_bucket{service_name="some-service", le="1000"}'
        values: 6 6 6 6 6 6 6
      - series: 'some_bucket{service_name="some-service", le="10000"}'
        values: 10 11 12 13 14 14 14
      - series: 'some_bucket{service_name="some-service", le="+Inf"}'
        values: 10 100 200 300 400 500 600
    alert_rule_test:
      - eval_time: 5m
        alertname: someName
        exp_alerts: []
      - eval_time: 15m
        alertname: someName
        exp_alerts:
          - exp_labels:
              severity: error
              service_name: some-service
            exp_annotations:
              summary: "a summary"
              description: "adescription"

and my alert rule is:

 histogram_quantile(0.95, sum by(le) (rate(some_bucket{service_name="some-service"}[15m]))) >= 1000

The test is working fine, it does not trigger at the eval_time of 5 minutes and it does when it hits the correct interval. My question is regarding the interval set at the top

 - interval: 15m

My understanding is that this should be the scraping interval, but if I change it to 1 the test fails. Why is that? Does it mean that my time series/input data needs changing?

Thank you


Solution

  • The given interval is not the scrape interval per se but the time between the values in the series.

    Setting interval to 15 min means that your series (with seven entries each, so six gaps between them) define data for 6 x 15 = 90 minutes.

    Setting this to 1m means that after six minutes your test data is empty. I couldn't find a behavior in any documentation but I guess it is either undefined or treated as missing value.

    The following test will run with interval: 15m. Setting this to 1m breaks the test and you can see that you get 'nil' as values for the buckets.

    evaluation_interval: 1m
    
    tests:
      - interval: 1m
        # Series data.
        input_series:
          - series: 'some_bucket{service_name="some-service", le="1000"}'
            values: 6 6 6 6 6 6 6
          - series: 'some_bucket{service_name="some-service", le="10000"}'
            values: 10 11 12 13 14 14 14
          - series: 'some_bucket{service_name="some-service", le="+Inf"}'
            values: 10 100 200 300 400 500 600
        promql_expr_test:
          - expr: histogram_quantile(0.95, sum by(le) (rate(some_bucket{service_name="some-service"}[15m])))
            eval_time: 15m
            exp_samples:
              - value: 10000
          - expr: some_bucket
            eval_time: 16m
            exp_samples:
              - labels: 'some_bucket{service_name="some-service",le="1000"}'
                value: 6
              - labels: 'some_bucket{service_name="some-service",le="10000"}'
                value: 11
              - labels: 'some_bucket{service_name="some-service",le="+Inf"}'
                value: 100