Search code examples
monitoringprometheus

Track Events with Prometheus Counters


Using Prometheus for things that are per second works really great and I've had great success with rate and irate. I am just at a loss how to graph something that's happening very rarely and is a big deal.

So I have a counter I am incrementing that's called job_failed. Whenever that happens it shows up in my instant-vector. If I graph it directly it always goes up and I see a bump in the graph, but this isn't giving me clear enough indication that a job has failed. So I'd like to have it be a spike in a zeroed graph.

If I do a rate(job_failed[15s]) I get my spike - but it's a per second spike so it's value is 0.1 although the change I want is 1. I tried increase(job_failed[1m]) but that is also not adding up correctly, occasionally leaving me with values like 2.18 etc.

Is there a way to only see a single spike? This seems like a rather trivial thing but I can't figure it out.


Solution

  • Prometheus is suited more to high volume than low volume events, as at low volumes artifacts from how we keep things accurate on average show up.

    So for example rate(job_failed[15s]) with an increase of 1 over the 15 seconds is 1/15 = 0.066/s. Rounding could make that show as 0.1.

    https://www.youtube.com/watch?v=67Ulrq6DxwA goes into more detail as to how this all works.

    The short version is what you're doing now is the way to do it.