Search code examples
prometheusprometheus-blackbox-exporter

How to calculate "SLAs" with blackbox-exporter metrics


I have a blackbox exporter that checks some HTTP endpoints. I've noticed that it doesn't use (rightly) histograms, so I was wondering what's the best way to calculate SLAs for each endpoint?

For instance let's say I check http://google.com, I'd like to calculate: - the percentage of times I received a valid response (probe_success) - the percentage of times the response was fetched within X milliseconds

I've tried using avg_over_time:

avg_over_time(probe_success{target="https://google.com"}[30d]

and dividing by the count of the same metric but I know it's wrong and something's missing


Solution

  • avg_over_time(probe_success[1d]) will give you a ratio between 0 (0%) and 1 (100%). So if you want a percentage out of it, multiply by 100. Or set it up as such in Grafana (I believe it's called "percent (0.0 - 1.0)" or something like that.

    If OTOH you want a percentile for some metric, say 90th percentile memory utilization, you'd use something like quantile_over_time(0.9, memory_utilization[1d]).