Search code examples
metricsdatadog

Datadog distinct-like custom metrics


Given following scenario:

  • A lambda receives an event via SQS
  • The lambda receives a uuid pointing to an entity.
  • The lambda may fail with an error
  • SQS will retrial that particular entity several times
  • The lambda will be called with different entities thousand of times

Right now we monitor a custom error-count metric like myService.errorType. Which gives us an exact number of how many times an error occurred - independent from a specific entity: If an entity can't be processed like 100 times, then the metric value will be 100.

What I'd like to have, though, is a distinct metric based on the UUID. Example:

  • entity with id 123 fails 10 times
  • entity with id 456 succeeds
  • entity with id 789 fails 20 times

Then I'd like to have a metric with the value of 2 - because the processes failed for two entities only (and not for 30, as it would be reported right now).

While searching for a solution I found the possibility of using tags. But as the docs point out they are not meant for such a use-case:

Tags shouldn’t originate from unbounded sources, such as epoch timestamps, user IDs, or request IDs. Doing so may infinitely increase the number of metrics for your organization and impact your billing.

So are there any other possibilities to achieve my goals?


Solution

  • I've solved it now by verifying the status via code and by adding tags to the metrics:

    • occurrence:first
    • subsequent

    This way I can filter in my dashboard for occurrence:first only.