Search code examples
restloggingtracemetricsopen-telemetry

Are time-related OpenTelemetry metrics an anti-pattern?


When setting up metrics and telemetry for my API, is it an anti-pattern to track something like "request-latency" as a metric (possibly in addition to) tracking it as a span?

For example, say my API makes a request to another API in order to generate a response. If I want to track latency information such as:

  • My API's response latency
  • The latency for the request from my API to the upstream API
  • DB request latency
  • Etc.

That seems like a good candidate for using a span but I think it would also be helpful to have it as a metric.

Is it a bad practice to duplicate the OTEL data capture (as both a metric and a span)?

I can likely extract that information and avoid duplication, but it might be simpler to log it as a metric as well.

Thanks in advance for your help.


Solution

  • I would say traces and also metrics have own use cases. Traces have usually low retention period (AWS X-Ray: 30 days) + you can generate metrics based on traces for short time period (AWS X-Ray: 24 hours). If you will need longer time period then those queries will be expensive (and slow). So I would say metrics stored in time series DB will be perfect use case for longer time period stats.

    BTW: there is also experimental Span Metrics Processor, which you can use to generate Prometheus metrics from the spans directly with OTEL collector - no additional app instrumentation/code.