I want to measure how long it takes to process some data: My application reads that data from a given source at a fixed rate. Before each circle I store Instant.now()
. I read the data, adding that single timestamp to each entry. The data gets stored, converted, and just before I send it out via WebSockets, I want to measure the duration between now()
and that initial timestamp.
I tried
long millis = Duration.between(dataEntry.getReceivedTimestamp(), Instant.now()).toMillis();
LOG.info(millis + "ms");
registry.timer("processingDuration").record(millis, TimeUnit.MILLISECONDS);
but visualizing this allows me to use only processingDuration_seconds_count
, _max
and _sum
. count
and sum
increase over time (of course), max
is constant most of the time. So how do I see higher and lower plateaus of load? I tried irate(processingDuration_seconds_sum[10m])
to see jumps at least, but as irate()
uses two datapoints only, I still cannot identify longer periods of high load easily. Plus: The value is around 0.6 in the graph, while the logged ms are around 5-10, so I'm losing the actual value here.
So I tried to use a Gauge
instead – which is supposed to allow both increasing and decreasing values:
registry.gauge("processingDurationGauge", millis);
I'd thought this would go up and down within the range of the logged ms, but it's constantly 92.
How do I measure throughout time of data?
Using a timer and record
is the correct solution.
long millis = Duration.between(dataEntry.getReceivedTimestamp(), Instant.now()).toMillis();
LOG.info(millis + "ms");
registry.timer("processingDuration").record(millis, TimeUnit.MILLISECONDS);
Assuming you are scraping every 30 seconds you can us the _sum
and _count
to get the average recorded duration:
increase(processingDuration_seconds_sum[1m])/increase(processingDuration_seconds_count[1m])
And if you wanted to compare how the current duration is behaving compared to the average over the last day:
((increase(processingDuration_seconds_sum[1m])/
increase(processingDuration_seconds_count[1m]))*1.50) >
increase(processingDuration_seconds_sum[24h])/
increase(processingDuration_seconds_count[24h])
That would only return values where the 1m average is more that 1.5x the daily average. (I haven't tested that query, but it should get the idea).