I'm working with a poller that polls every minute, and I query aggregate data from it by the hour. The 1-minute data looks something like this
my_metric{system="sys1", subsystem="ss1", group="A"} 1
my_metric{system="sys1", subsystem="ss2", group="A"} 1
my_metric{system="sys1", subsystem="ss3", group="B"} 1
my_metric{system="sys2", subsystem="ss4", group="A"} 1
my_metric{system="sys2", subsystem="ss5", group="B"} 1
my_metric{system="sys2", subsystem="ss6", group="A"} 1
I want to count the number of systems each hour that are in each group. However, there are some systems that undergo a change from A to B within the 1-hour window, and using count by (system, group)
or similar queries counts these systems twice. So is there a way to use label_replace or group or count distinct to do something like - if A and B both exist within the 1-hour window, then label_replace with "Updated"?
Without being able to test the query its hard to guess whether they will work as intended, especially for these non-trivial queries.
The first operator we need is the unless
operator. It works like an XOR
metricA unless metricB
returns metricA when metricB
does not exist and metricB
when metricA
does not exist. In combination with avg_over_time
we can do the following:
avg_over_time(my_metric{group="A"}[1h])
unless
avg_over_time(my_metric{group="B"}[1h])
gives us all the metrics that existed only in one group within the last hour.
Now we need to handle the cases where a system switched a group. In that case you need to decide if you want to have that counted for A or B. There we can use the and
operator,
metricA and metricB
returns you the values of metric A where also metric B exists.
avg_over_time(my_metric{group="A"}[1h])
and
avg_over_time(my_metric{group="B"}[1h])
returns you a metric for group A, if they existed the same time. (If you need it the other way around, just switch A and B)
The next operator is or
:
metricA or metricB
simply metricA as long it exists while it returns metricB when metric A does not exist.
(
avg_over_time(my_metric{group="A"}[1h])
unless
avg_over_time(my_metric{group="B"}[1h])
)
or
(
avg_over_time(my_metric{group="A"}[1h])
and
avg_over_time(my_metric{group="B"}[1h])
)
should return you now the metric when it existed only in one group, or the metric for group B if it existed only in both groups. The only thing you need to do is put a count by (group)
around it and it should bring you the expected results.
(In general its a good practice to build up non-trivial queries step by step and test them each time, so you know what metrics are counted and if they are exactly what you are looking for)