Goal
Track RPM and Up time via grafana & prometheus
Situation
We are using
django-prometheus -> To emit metrics
fluent-bit -> Scrapes django metrics every 15s and pushes to prometheus
prometheus -> 2 shards running via prometheus operator on k8s
Problem
When we compare grafana dashboard with aws target group request metrics it isn't matching. Tried all below options
Expr: sum by(service) (irate(django_http_requests_before_middlewares_total{namespace="name"}[5m]))
Expr: sum by(service) (increase(django_http_requests_before_middlewares_total{namespace="name"}[5m]))
Expr: sum by(service) (rate(django_http_requests_before_middlewares_total{namespace="name"}[5m]))
django_http_requests_before_middlewares_total -> This is Counter data type.
This counter never resets because we have unique dimension
- container_id
- service_name
- namespace
Q. Is it possible to create dashboard on grafana which resembles aws target group metrics ?
Ideally increase
should work but it takes diff continuously and that might be giving incorrect result.
Thanks in advance.
In theory the following query should return the exact number of per-service requests for the last minute:
sum(
increase(django_http_requests_before_middlewares_total[1m])
) by (service)
But in practice Prometheus may return unexpected results for this query:
[1m]
in the query above) and the first raw sample in the lookbehind window.increase(m[d])
would return empty results for d <= 1m
.Prometheus developers are aware of these issues and are going to fix them - see this design doc.
In the mean time you can try using increase()
function in VictoriaMetrics - this is Prometheus-like monitoring solution I work on. Its' increase function is free from issues mentioned above.
An important note: both Prometheus and VictoriaMetrics calculate query results independently per each point displayed on the graph. So, if you need displaying per-minute number of requests using the query above, you need to set the interval between points on the graph (aka step
) to one minute.