Search code examples
jvmprometheusmetricsjmxdatadog

Understanding Prometheus JVM metrics


I am interested in tracking two metrics within JVM - Number of GCs per minute and Time Spent in GC per minute. I have metrics jvm_gc_collection_seconds.count and jvm_gc_collection_seconds.sum available with me on the dashboard but I am a little confused about their meaning.

The first metric jvm_gc_collection_seconds.count seems like it has something to do with measuring time but reading up on it I believe it is the number of times GC was invoked from the start of time (when application started).

  1. Is this right?
  2. If so why is there the word "seconds" in the metric name?
  3. Would jvm_gc_collection_seconds.count/1 minute give me the number of GC invocation per minute?

The second metric jvm_gc_collection_seconds.sum I believe is the total time spent doing GC activity in seconds from the start of time.

  1. Is that right?
  2. Would jvm_gc_collection_seconds.sum/1 minute give me the time spent in seconds doing GC activity in a 1 minute time window?

Solution

  • jvm_gc_collection_seconds is a summary metric.

    A summary with a base metric name of <basename> exposes multiple time series during a scrape:

    • streaming φ-quantiles (0 ≤ φ ≤ 1) of observed events, exposed as <basename>{quantile="<φ>"}
    • the total sum of all observed values, exposed as <basename>_sum
    • the count of events that have been observed, exposed as <basename>_count

    So jvm_gc_collection_seconds_count has as a total value number of GC events that took place since application start. And jvm_gc_collection_seconds_sum - total number of seconds taken by all those events.

    To get the number of GC invocation per minute you can use increase(jvm_gc_collection_seconds_count [1m]). And similarly for time spent: increase(jvm_gc_collection_seconds_sum [1m]).


    Additionally, you might find some helpful tips regarding JVM metrics regarding garbage collection in this post by Brian Brazil.