Search code examples
cratedb

Crate DB Snapshot size


I have setup a python script which creates a crate db snapshot everyday at noon. The query I ran to setup the repo is:

CREATE REPOSITORY repo_name TYPE FS WITH (LOCATION='/path/to/folder', compress=true);

The query I run everyday in order to create the snapshot is:

CREATE SNAPSHOT repo_name.{} ALL WITH (wait_for_completion=true, ignore_unavailable=true);

On the initial run, the snapshot directory size was same as the database size (30GB).

After about a month, the database has grown to 40GB while the snapshot directory size has grown to ~120GB (almost thrice the size of the database!).

Is this normal?

If yes, are there any options/optimizations I can try out to reduce the size of the snapshots?


Solution

  • According to Crate the table data is not compressed. Only metadata is compressed. (I agree it's confusing.)

    Snapshots are incremental so I'm not entirely sure why this has grown so much. (Do you perhaps ingest a lot of data that is then perhaps removed but still present during the snapshot?) May be worth raising a query with Crate directly on their github and checking if this is a bug or not.