Search code examples
prometheusgrafanaprometheus-alertmanager

CPU Load average rule for a minute


Thank you in advance for reading my question.

I'm using both Grafana and Prometheus to monitor systems.

avg(node_load1{instance="$node",job="$job"}) /  count(count(node_cpu_seconds_total{instance="$node",job="$job"}) by (cpu)) * 100 

The above query is used in Grafana to check Load average for 1 minute.

Could you let me know how to write the query in Prometheus? I just copied it and pasted into Prometheus, but there was no result.

node_load1 / count by (instance, job) (node_cpu_seconds_total{mode="idle"}) * 100 >= 95

I also tried using the query above. However, the calculated CPU Load average was a little bit different from the origin query in Grafana.


Solution

  • The "$node" and "$job" are variables defined in the Grafana tool, they can't be used in Prometheus this way. We need to replace with the real value of them, for example:

    avg(node_load1{instance="my_instance",job="my_job"}) /  count(count(node_cpu_seconds_total{instance="my_instance",job="my_job"}) by (cpu)) * 100
    

    Or simply, remove them:

    avg(node_load1) / count(count(node_cpu_seconds_total) by (cpu)) * 100