Search code examples
prometheusprometheus-operator

prometheus-operator: what is the difference between shards and replicas?


The following manifest creates a Prometheus server with two replicas and two shards:

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  labels:
    prometheus: prometheus
  name: prometheus
  namespace: default
spec:
  serviceAccountName: prometheus
  replicas: 2
  shards: 2
  serviceMonitorSelector:
    matchLabels:
      team: frontend

What is the difference between replicas and shards?


Solution

  • Sharding in Prometheus involves splitting the metrics across multiple servers, to improve performance (especially query performance) and scalability. Each shard is responsible for collecting and storing a subset of the total metrics.

    Replication involves creating multiple copies of the data across multiple servers, to increase availability and fault tolerance. Each replica contains a full copy of the data, and any changes made to one replica are eventually propagated to the others.

    This is true for any app - shard and replication are generic concepts used to describe this and not something specific to prometheus. This is widely used in Databases.