Search code examples
c#.netprometheus

Why are my tagged metrics disappearing/resetting after 10 minutes, and can I prevent it


I'm using System.Diagnostic.Counter in .NET7 in a library to count an event with a few different tags.

// Meter creation
private static Meter meter = new Meter("my_library");

// Counter creation
private static Counter<int> myCounter= meter.CreateCounter<int>("my_metric_counter", description: "My counter");
// When event A happens
myCounter.Add(1,
  KeyValuePair.Create<string, object>("tag_key", "A");
// When event B happens
myCounter.Add(1,
  KeyValuePair.Create<string, object>("tag_key", "B");

The application which includes the library uses Prometheus.NET to start a metrics server.

applicationBuilder.UseMetricServer();

With this setup, Prometheus can successfully scrape the application and retrieve my metrics which look like this...

my_library_my_metric_counter{tag_key="tag value A"} 10
my_library_my_metric_counter{tag_key="tag value B"} 20

My problem comes if a particular tag in the counter is not update in the library for about 10 minutes. Let's say "tag value B" isn't counted for > 10 minutes. After this, a metrics scrape returns only...

my_library_my_metric_counter{tag_key="tag value A"} 100

This happens even if Prometheus is continually scraping the metrics endpoint at 30 second intervals.

If the "tag value B" is counted once again after this 10 minute time, the metrics re-appears in Prometheus (and in a manual scrape of the app's /metrics endpoint). However, the value of the metric will be reset to 1.

my_library_my_metric_counter{tag_key="tag value A"} 200
my_library_my_metric_counter{tag_key="tag value B"} 1

I'm not sure if this is a .NET System.Diagnostic.Counter behavior or if it is a Prometheus.NET metrics server behavior. In either case, I'm looking for a way to preserve these metrics and not have them reset after 10 minutes of inactivity.

I'll also add that my library code also has a few System.Diagnostic.Metrics.ObservableGauge instances. Interestingly, the metrics from these gauges, though rarely updated, do NOT disappear on me after 10 minutes of not updating (perhaps because the scrape is essentially an update?). One might suggest I use ObservableGauge instead of Counter; but I've yet to find a way to associate tags with an ObservableGauge.


Solution

  • After stumbling across the same issue, I found that this timeout is configured in prometheus-net, which has MeterAdapterOptions.MetricsExpireAfter.

    In the documentation, they are also mentioning that you can configure using Metrics.ConfigureMeterAdapter().

    So for me, the solution was to use something like this Program.cs. This should be executed before the first metrics collection:

    Metrics.ConfigureMeterAdapter(o => o.MetricsExpireAfter = TimeSpan.FromDays(1));
    

    Looking at the code, this might only work if you call SuppressDefaultMetrics() with SuppressMeters = false. If you do not call SuppressDefaultMetrics() at all, the MeterAdapterOptions might not be applied.

    Note also that the MetricsExpireAfter must not be greater than 1 day (source), so it cannot be infinite or disabled.