Search code examples
grafanaprometheusprometheus-alertmanager

Using metric in right hand side of prometheus query


I am using prometheus and grafana to monitor some servers. One of the metrics I have exposed is called recent_tables, which contains the number of assets who have written to sql tables in the past 15 minutes (machines automatically post to sql). It's labels are table, job, and status_code. I also have metric online_assets, which has the amount of assets that are online. Its labels are cluster_id, db_host, and job.

I am trying to make an alert for when < 90% of online assets have written to sql tables recently. Before I write the alert, I am trying to get a panel in grafana to populate the data and eventually transition this to an alertmanager expr. The following queries do not work, and I don't understand why:

recent_tables < online_assets * 0.9

sum(recent_tables) by (table) < online_assets * 0.9

However, the following query works:

sum(recent_tables{table="<table>"}) - sum(online_assets)

I do not want to have to make an alert based on every table (this is possible through ansible), but I would like to understand if there is a way to get multiple vectors out of the same query.


Solution

  • As Michael Doubez pointed out, you cannot have unbalanced label dimensions when making queries.

    I ended up with the following: sum(recent_tables) by (table) - ignoring(table) group_left() sum(live_assets) * 0.9 < 0

    This accounts for the mismatch in dimensions but there may be a cleaner way.