Search code examples
prometheuspromqlprometheus-node-exporter

Calculate difference between first and last element in counter metrics' range vector


I'm using a PromQL query to calculate the cumulative traffic pushed/received through some interfaces on any node on the last 60 minutes. With Prometheus Node Exporter's metrics:

delta(node_network_receive_bytes_total{device=~"ens.*"}[60m])*8

And it's perfectly fine as far as the node does not reboot in that interval, the value is simply the difference between the vector's tip and its tail. When the system reboots and the counter resets, the meaning of the function stops reflecting said result.

e.g. Being this the graph for node_network_transmit_bytes_total:

IPv6 traffic

... the function will return -9MiB, instead of 10.2MiB.

I guess I can play out with rate()s to get an estimate by also using the time. But is there a better function/way to get the actual thing?


Solution

  • As indicated in the documentation of delta():

    delta should only be used with gauges.

    You should be using the increase() function which is specific to counters.

    Breaks in monotonicity (such as counter resets due to target restarts) are automatically adjusted for.

    This is one of the main reason to distinguish between gauge and counters. See this answer about the difference.

    You can identify counters by one of the following methods:

    • the type in the textfile output (ex: # TYPE http_requests_total counter)
    • the value is monotonically increasing (grafana proposes counter related function when it detects it)
    • the name should end with _total (if exporter respects best practices)