Search code examples
monitoringprometheusmetrics

Why there are both counters and gauges in Prometheus if gauges can act as counters?


When deciding between Counter and Gauge, Prometheus documentation states that

To pick between counter and gauge, there is a simple rule of thumb: if the value can go down, it is a gauge. Counters can only go up (and reset, such as when a process restarts).

They seem to cover overlapping use cases: you could use a Gauge that only ever increases. So why even create the Counter metric type in the first place? Why don't you simply use Gauges for both?


Solution

  • From a conceptual point of view, gauge and counter have different purposes

    • a gauge typically represents a state, usually with the purpose of detecting saturation.
    • the absolute value of a counter is not really meaningful, the real purpose is rather to compute an evolution (usually a utilization) with functions like irate/rate(), increase() ...

    Those evolution operations requires a reliable computation of the increase that you could not achieve with a gauge because you need to detect resets of the value.

    Technically, a counter has two important properties:

    1. it always starts at 0
    2. it always increases (i.e. incremented in the code)

    If the application restarts between two Prometheus scrapes, the value of the second scrape is likely to be less than the previous scrape and the increase can be recovered (somewhat because you'll always lose the increase between the last scrape and the reset).

    A simple algorithm to compute the increase of counter between scrapes from t1 to t2 is:

    • if counter(t2) >= counter(t1) then increase=counter(t2)-counter(t1)
    • if counter(t2) < counter(t1) then increase=counter(t2)

    As a conclusion, from a technical point of view, you can use a gauge instead of a counter provided you reset it to 0 at startup and only increment it, but any violation of contract will lead to wrong values.

    As a side note, I also expect a counter implementation to use unsigned integer representation while gauge will rather use a floating point representation. This has some minor impacts on the code such as the ability to overflow to 0 automatically and better support for atomic operations on current cpus.