Search code examples
metricsprometheuscontinuous

prometheus query for continuous uptime


I'm a prometheus newbie and have been trying to figure out the right query to get the last continuous uptime for my service.

For example, if the present time is 0:01:20 my service was up at 0:00:00, went down at 0:01:01 and went up again at 0:01:10, I'd like to see the uptime of "10 seconds".

I'm mainly looking at the "up{}" metric and possibly combine it with the functions (changes(), rate(), etc.) but no luck so far. I don't see any other prometheus metric similar to "up" either.


Solution

  • The problem is that you need something which tells when your service was actually up vs. whether the node was up :)
    We use the following (I hope one will help or the general idea of each):
    1. When we look at a host we use node_time{...} - node_boot_time{...}
    2. When we look at a specific process / container (docker via cadvisor in our case) we use node_time{...} - on(instance) group_right container_start_time_seconds{name=~"..."}) by(name,instance)