Search code examples
prometheusrestoresnapshot

Backup and restore prometheus metrics


So, the case I have is I am deploying a product, I am using prometheus/grafana for metrics. Weird things may happen and I want to get the metrics for investigation. I want to instruct the customer support team on how to get them and hand them over for investigation, but I cannot make it work.

So, following these pages:

I generated the snapshot on the server and it is saved in a directory named XXXXX-XXXX/XXXXX. I copied this file locally.

Out of commodity, I created a docker compose like this:

version: '3.8'
services:
  prometheus:
    image: prom/prometheus:v2.16.0
    restart: always
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - ./prometheus/data:/data:rw
    command:
      - '--storage.tsdb.path=/data'
      - '--web.enable-admin-api'
      - '--config.file=/etc/prometheus/prometheus.yml'
    port:
      - 9090:9090

The config file - nothing special, but here it is:

global:
  scrape_interval: 15s
  evaluation_interval: 15s
rule_files:
  # comment
scrape_configs:
  - job_name: prometheus
    static_configs:
      - targets: ['localhost:9090']
  - job_name: app
    scrape_interval: 5s
    static_configs:
      - targets: ['phony-host:port']

This is a dummy config, those hosts do not even exists, is just something to have in there.

Now, after I copy the snapshot files in the ./prometheus/data directory and I start the docker (via docker-compose), I can't see any of the metrics I expect to have in the snapshot. Am I doing something wrong? Is there something missing in the config? To be clear, I am not copying the XXX-XXX/XXXX dir, I only copy the files.

Also, few other comments:

  • no errors in prometheus logs;
  • I can see TSDB starting (one line of log);
  • I don't see any reference to the existing snapshot.

Solution

  • Solved. My mistake - and it's true, the documentation is not 100% clear either.

    So, I had the snapshot stored in a directory like {DATA}\{XXXX-XXXX}\{YYYY}. My mistake was I was copying the content of directory {XXXX-XXXX}\{YYYY}. I should have copied the content of directory {XXXX-XXXX}. Did that and it works.

    Also it worth to note the fact that it may take a while to have those metrics visible.