I have a simple spring boot app with the following config (the project is available here on GitHub):
management:
metrics:
export:
simple:
mode: step
endpoints:
web:
exposure:
include: "*"
The above config creates SimpleMeterRegistry
and configures its metrics to be step-based, with 60 seconds step. I have one script that sends 50-100 requests per second to the service dummy endpoint and there's the other script that polls the data from /actuator/metrics/http.server.requests
every X seconds. When I run the latter script every 60 seconds everything works as expected, but when the script is run every 120 seconds, the response always contains zeros for TOTAL_TIME
and COUNT
metrics.
Can anyone explain this behavior?
I have read the documentation here. The picture below could indicate that a registry will try to aggregate the data for the previous interval only if pollAsRate is called during the current interval. This will explain why it does not work for 120 seconds interval. But this is just my assumption, does anyone know what is really happening here?
Spring boot version: 2.1.7.RELEASE
UPDATE
I did a similar test with management.metrics.export.simple.step=10s
, it works fine when polling interval is 10s and not working when it is 20s. For 15s interval it sporadically works. So, it's definitely related to the step size and polling frequency.
Finally figured out what is happening.
On every request to /actuator/metrics
, MetricsEndpoint
is going to merge measures (see here). That is done by collecting values for all meters with measurement.getValue()
. The StepMeasurement.getValue()
will not simply return the value, it will update the current and the previous intervals and counts, and roll the count (see here and here).
StepMeasurement.getValue
public double getValue() {
double absoluteCount = (Double)this.f.get();
double inc = Math.max(0.0D, absoluteCount - this.lastCount.sum());
this.lastCount.add(inc);
this.value.getCurrent().add(inc);
return this.value.poll();
}
StepDouble.poll
public double poll() {
rollCount(clock.wallTime());
return previous;
}
How is this related to the polling interval? If you do not poll /actuator/metrics
endpoint, the current and previous intervals will not be updated, thus resulting in the current interval not being up-to-date and metrics being recorded for the "wrong" interval.