Search code examples
grafanainfluxdbmetricsspring-micrometer

How to make sense of the micrometer metrics using SpringBoot 2, InfluxDB and Grafana?


I'm trying to configure a SpringBoot application to export metrics to InfluxDB to visualise them using a Grafana dashboard. I'm using this dashboard as an example which uses Prometheus as a backend. For some metrics I have no problem figuring out how to create graphs for them but for some others I don't know how to create the graphs or even if it's possible at all. So I enumerate the things I'm not really sure about in the following points:

  • Is there any documentation where a value unit is described? The application I'm using as an example doesn't have any load on it so sometimes I don't know whether the value is a bit, a byte, a second, a millisecond, a count, etc.

  • Some measurements contain the tag 'metric_type = histogram' with fields 'count', 'sum', 'mean' and 'upper'. Again, here I don't know what the value units are, what upper means or how I'm suppose to plot them. Examples of this are 'http_server_requests' or 'jvm_gc_pause'.

  • From what I see in the Grafana dashboard example, it seems I should use these measurements of type histogram to create both a graph with counts and graphs with duration. For example I see I should be able to create a graph with the number of requests and another one with their duration. Or for the garbage collector, I should be able to provide a graph for the number of minor and major GCs and another for their duration.

As an example of measures I get inserted into InfluxDB:

time                 count exception mean     method metric_type outcome status sum      upper    uri
1625579637946000000  1     None      0.892144 GET    histogram   SUCCESS 200    0.892144 0.892144 /actuator/health

or

time                action          cause                 count   mean  metric_type  sum upper
1625581132316000000 end of minor    GC Allocation Failure     1      2  histogram    2   2

Solution

  • I agree the documentation for micrometer is not great. I've had to dig through the code to find answers...

    Regarding your questions about jvm_gc_pause, it is a Timer and the implementation is AbstractTimer which is a class that wraps a Histogram among other components. This particular metric is registered by the JvmGcMetrics class. The various measurements that are published to InfluxDB are determined by the InfluxMeterRegistry.writeTimer(Timer timer) method:

    • sum: timer.totalTime(getBaseTimeUnit()) // The total time of recorded events
    • count: timer.count() // The number of times stop has been called on the timer
    • mean: timer.mean(getBaseTimeUnit()) // totalTime()/count()
    • upper: timer.max(getBaseTimeUnit()) // The max time of a single event

    The base time unit is milliseconds.

    Similarly, http_server_requests appears to be a Timer as well.

    I believe you are correct that the sensible thing is to chart on two separate Grafana panels: one panel for GC pause seconds using sum (or mean or upper), and one panel for GC events using count.