Search code examples
clickhouseclickhouse-client

Backup ClickHouse cluster database without using clickhouse-backup utility


I have a three node clickhouse cluster database that I need backup using Clickhouse backup command.

replica = 1

shard = 3

I tried local storage available as shared on all nodes and s3 storage.

When I try:

backup database mydb to Disk('backups','backup.zip')

This backs up the database but can't be restored with on cluster clause

backup database mydb on cluster mycluster to Disk('s3','backup.zip')

This exits with error that node x lost lock on backup.zip file.
can someone please recommend a proper way to backup cluster database.


Solution

  • At the end I had to chose between two options:

    • Backup individual ReplicatedMergetree tables on each nodes When restoring restore tables from backup in repective nodes and re-create distributed tables on top.
    • Export as Parquet This doesn't require seperate backups from all nodes and single export of distributed table will get data from all ReplicatedMergetree tables attached to it. It is very fast and compressed using snappy (by default). Restore is as simple as: insert into db.distributed_table from file/S3/SeCluster(filepath); I opt for Parquet files.