I am trying to match two metrics that have a one to many relationship. The first metric webrtc_metrics_audio_outbound_rtp_bytes_sent
maps to many of the second metric relay_node_audio_track_bytes
. That is to say that for each audio outbound rtp stream, there are many relay nodes consuming the stream. Each stream has a session_id
that I'm trying to match across the metrics while also retaining the pod_name
that is specific to each relay node.
I'm using comparison operators with bool modifiers as my intent is to set up an alert based on these two metrics. The alert should fire whenever there is non zero data for webrtc_metrics_audio_outbound_rtp_bytes_sent
but corresponding zero data on relay_node_audio_track_bytes
for the same session_id
.
Here is my attempt using the following query and the corresponding output in Grafana:
((sum by (session_id, pod_name) (rate(relay_node_audio_track_bytes{pod_name=~"$pod",session_id=~"$session"}[$__rate_interval]))) == bool 0)
* on (session_id) group_left(pod_name)
((sum by (session_id) (label_replace(rate(webrtc_metrics_audio_outbound_rtp_bytes_sent{app="capturer",id=~"$session"}[$__rate_interval]), "session_id","$1","id","(.*)"))) > bool 0) > 0
You can see in the first graph when a Pod
is selected in the dropdown, the query works as intended. But when I try to use a wildcard to query for all pods I receive there error: execution: multiple matches for labels: grouping labels must ensure unique matches
Here are the left and right sides of the query showing all the labels in each metric. Note I used label_replace
in the vector matching query to rename id
to session_id
.
LHS: (rate(playback_relay_node_audio_track_bytes{pod_name=~"$pod",session_id=~"$session"}[$__rate_interval]))
RHS: label_replace(rate(webrtc_metrics_audio_outbound_rtp_bytes_sent{app="capturer",id=~"$session"}[$__rate_interval]), "session_id","$1","id","(.*)")
Can somebody please explain why selecting a specific Pod
does not throw the same error as when using a wildcard label matcher? Is there some other labeling methods I need to use to get this working as intended? Ideally I'd like to see this boolean condition plotted across all pods and session ids. Thanks!
Try removing the pod_name
from group_left()
modifier:
((sum by (session_id, pod_name) (rate(relay_node_audio_track_bytes{pod_name=~"$pod",session_id=~"$session"}[$__rate_interval]))) == bool 0)
* on (session_id) group_left()
((sum by (session_id) (label_replace(rate(webrtc_metrics_audio_outbound_rtp_bytes_sent{app="capturer",id=~"$session"}[$__rate_interval]), "session_id","$1","id","(.*)"))) > bool 0) > 0
Prometheus leaves all the labels from the left side after applying the *
operator (or any other operator) if group_left()
modifier is used. E.g. the original pod_name
values from the left side are left in results after calculating the *
with group_left()
modifier.
The list of labels inside the group_left()
modifier is taken from the matching time series on the right side of *
. In this case time series returned from the right side of *
have no the pod_name
label. That's why the original values for this label obtained from the left side are substituted with empty values from the right side, e.g. they are effectively deleted. This may result in duplicate time series
error when the same session_id
value is present in multiple time series with different pod_name
values at the left side of *
.
See more details in the official docs.