Using Prometheus for things that are per second works really great and I've had great success with rate
and irate
. I am just at a loss how to graph something that's happening very rarely and is a big deal.
So I have a counter I am incrementing that's called job_failed
. Whenever that happens it shows up in my instant-vector. If I graph it directly it always goes up and I see a bump in the graph, but this isn't giving me clear enough indication that a job has failed. So I'd like to have it be a spike in a zeroed graph.
If I do a rate(job_failed[15s])
I get my spike - but it's a per second spike so it's value is 0.1 although the change I want is 1.
I tried increase(job_failed[1m])
but that is also not adding up correctly, occasionally leaving me with values like 2.18 etc.
Is there a way to only see a single spike? This seems like a rather trivial thing but I can't figure it out.
Prometheus is suited more to high volume than low volume events, as at low volumes artifacts from how we keep things accurate on average show up.
So for example rate(job_failed[15s])
with an increase of 1 over the 15 seconds is 1/15 = 0.066/s. Rounding could make that show as 0.1.
https://www.youtube.com/watch?v=67Ulrq6DxwA goes into more detail as to how this all works.
The short version is what you're doing now is the way to do it.