I have a number of queries for alerts and dashboard use that have an incredible amount of copy/paste boilerplate for filtering and enriching with labels.
Is there no way to save and re-use this repetitive PromQL server-side, like views in SQL databases? Server-side functions, macros, ... anything?
(I know I can "save" the query as a recording rule, but this is incredibly inefficient when joining on labels. The recording rule has to expose every label anyone might want, producing painfully high cardinality and expensive info-metrics. It wastes storage and memory, and "solving" it with "just add another TB of RAM for Prometheus" is fashionable in cloud, but incredibly wasteful.)
Consider this query using kube-state-metrics
data:
sum without(job,instance,service,endpoint,metrics_path,prometheus) (
kubelet_volume_stats_available_bytes{kube_cluster="$kube_cluster"}
# enrich with labels
* on (namespace,persistentvolumeclaim)
group_left(some_org_specific_label,other_org_specific_label)
group by (namespace,persistentvolumeclaim,some_org_specific_label,other_org_specific_label) (
kube_persistentvolumeclaim_labels{kube_cluster="$kube_cluster"}
)
* on (namespace,persistentvolumeclaim)
group_left(persistentvolume)
group by (namespace,persistentvolumeclaim,persistentvolume) (
# for some reason kube_persistentvolumeclaim_info uses label "volumename"
# and persistentvolume_info uses "persistentvolume"
label_replace(
kube_persistentvolumeclaim_info{kube_cluster="$kube_cluster"},
"persistentvolume", "$1", "volumename", "^(.*)$"
)
)
* on (persistentvolume)
group_left(csi_driver,csi_volume_handle,storageclass)
group by (persistentvolume,csi_driver,csi_volume_handle,storageclass) (
kube_persistentvolume_info
)
)
This just says:
kubelet_volume_stats_available_bytes
for values on kube_cluster="$cluster"
kube_persistentvolumeclaim_labels
to add labels some_org_specific_label
and other_org_specific_label
kube_persistentvolumeclaim_info
to find the persistentvolume
name associated with the persistentvolumeclaim
(handling label name inconsistency)kube_persistentvolume_info
to find the persistent volume's CSI ID and detailsIt's ugly, but it's not that bad... until you also want to write another query for disk space percentage free, and another for I/O thresholds, and so on. Each of which repeats all the same boilerplate.
I seem to need to push-down filter criteria manually for an efficient query too, Prometheus doesn't seem capable of anything like SQL engines' filter push-down logic.
And that's a short example. Here's another, getting a workload metric enriched with some kube pod labels, a kube pod annotation, and the running container image:
# This aggregation drops unwanted labels, since PromQL lacks a proper label_drop(...) operator to drop non-cardinal labels
sum without(endpoint,instance,job,prometheus,container,uid) (
some_workload_specific_metric{kube_cluster="$kube_cluster"}
# join on kube_pod_labels for project-id, PGD info, etc
* on (uid)
group_left(org_specific_label_1, org_specific_label_2, org_specific_annotation_1)
# note the group_by (...) expression repeats the labels in both the on (...) join key and
# the subject-labels in group_left(...). This protects against issues where added or unrelated
# labels that aren't of interest can churn. It's probably safe to write
# group ignoring(container,instance,job=)
# in this case, but better to make the query robust:
group by (uid, org_specific_label_1, org_specific_label_2) (
kube_pod_labels{kube_cluster="$kube_cluster"}
)
# join on kube_pod_info for the node hosting the pod and the pod ip address
* on (uid)
group_left(pod_ip,node)
group by (uid, pod_ip, node) (
kube_pod_info{kube_cluster="$kube_cluster"}
)
# join on kube_pod_container_info for the container image. Note that we join on container_id too
* on (uid,container_id)
group_left(image_spec,image_id)
group by (uid,container_id,image_spec,image_id) (
kube_pod_container_info{kube_cluster="$kube_cluster"}
)
# join on kube_pod_annotations for org_specific_annotation_1, if any
* on (uid)
group_left(org_specific_annotation_1)
group by (uid,org_specific_annotation_1) (
kube_pod_annotations{kube_cluster="$kube_cluster"}
)
)
Imagine reusing that for every (say) alertmanager query you want to fire with those labels exposed in it...
Is there a saner way to do this? In a SQL database I'd just CREATE VIEW
and join on it. Does everyone just use recording rules and pay the huge blow-out price grabbing and recording all possible labels then discarding them most of the time?
Prometheus doesn't support common table expression-like functionality (aka CTE). If you need such functionality, try MetricsQL at VictoriaMetrics - this is PromQL-like query language, which supports e.g. WITH
expressions. For example, if you want re-using the same query for different metric names, then you can put this query into WITH
template function, which accepts metric name, and then call this function with different metric names when needed:
WITH (
f(m) = some_complex_query_here
)
(
f(foo), # expand the query with `foo` metric
f(bar), # expand the query with `bar` metric
)
Disclaimer: I'm the author of MetricsQL.