Search code examples
prometheusthanos

How deduplication can take place in remote object storage in Thanos ecosystem?


I am exploring Thanos, for the existing monitoring cluster. Thanos querier can perform deduplication but this is runtime behavior. When the shipper sends data to remote object storage, each Prometheus data is being shipped. When HA mode is used in Prometheus then duplicate data will be shipped by shipper, which no one wants to store duplicated data in storage. So my question is there any solution from Thanos to deduplicate data in remote object storage or any external instrumentation is needed in the cluster?


Solution

  • In Thanos architecture you must define some unique external_labels (based on this doc).
    Since the labels are different from each other in different Prometheuses, so different metrics will be stored in the object storage.
    And by clarifying --query.replica-label=replica on querier it will deduplicate metrics based on your label.