Search code examples
mongodbdockerrsync

MongoDB: rsync whole db folder, except one huge collection that doesn't change


Here is content of MongoDB /data/db folder content. Where one of the collections is 723GB. And other collections just few KB.

-rw------- 1 lxd  docker 723G Dec  5 10:15 collection-0-1080408413244540209.wt
-rw------- 1 lxd  docker  36K Dec  5 10:15 collection-0-3112968025499504303.wt
-rw------- 1 lxd  docker  36K Dec  5 10:15 collection-2-1080408413244540209.wt
-rw------- 1 lxd  docker 4.0K Dec  5 10:14 collection-4-1080408413244540209.wt
-rw------- 1 lxd  docker  20K Dec  5 10:15 collection-7-1080408413244540209.wt
-rw------- 1 lxd  docker    0 Dec  5 10:14 .dbshell
drwx------ 2 lxd  docker   90 Dec  5 10:15 diagnostic.data
-rw------- 1 lxd  docker 8.1G Dec  5 10:15 index-1-1080408413244540209.wt
-rw------- 1 lxd  docker  20K Dec  5 10:14 index-1-3112968025499504303.wt
-rw------- 1 lxd  docker  36K Dec  5 10:15 index-3-1080408413244540209.wt
-rw------- 1 lxd  docker 4.0K Dec  5 10:14 index-5-1080408413244540209.wt
-rw------- 1 lxd  docker 4.0K Dec  5 10:14 index-6-1080408413244540209.wt
-rw------- 1 lxd  docker  20K Dec  5 10:14 index-8-1080408413244540209.wt
-rw------- 1 lxd  docker  20K Dec  5 10:15 index-9-1080408413244540209.wt
drwx------ 2 lxd  docker  110 Dec  5 10:15 journal
-rw------- 1 lxd  docker  36K Dec  5 10:15 _mdb_catalog.wt
-rw------- 1 lxd  docker    0 Dec  5 10:15 mongod.lock
-rw------- 1 lxd  docker  36K Dec  5 10:15 sizeStorer.wt
-rw------- 1 lxd  docker  114 Dec  5 10:14 storage.bson
-rw------- 1 lxd  docker   50 Dec  5 10:14 WiredTiger
-rw------- 1 lxd  docker 4.0K Dec  5 10:15 WiredTigerHS.wt
-rw------- 1 lxd  docker   21 Dec  5 10:14 WiredTiger.lock
-rw------- 1 lxd  docker 1.5K Dec  5 10:15 WiredTiger.turtle
-rw------- 1 lxd  docker  84K Dec  5 10:15 WiredTiger.wt
  • I simply backup whole docker volume to another server via rsync.
  • I never change content of the 723GB collection, but for some reason MongoDB update the file approximately once in a week.
  • Because of that rsync also update that file remotely. And because I'm using snapshots, every week new snapshot add another 723GB to the storage, that is unacceptable and cause me the problems.

To resolve that problem, I simply added 723GB collection into rsync exception and do not upload it anymore. Is that fine? May I after 1 year still use my backup to restore the server if I do not update collection-0-1080408413244540209.wt file any more?


Solution

  • By default, rsync only copies new or changed files from a source to destination so you dont need to add the file to exception list if you copy the mounted snapshot files. From the other hand the wiredTiger is a bit sensitive and generate checksum in the data root folder based on checkpoints from all collections so in case the file differ there is a big chance the mongod process to not be able to start from the restored snapshot. So I would suggest to not exclude the file completely but leave to rsync to check every time if file is same or differ and need to be copied again or not.

    P.S. Note that the scenario you have described was valid with the deprecated previous mongoDB storage engine mmapv1 where it was even possible to copy the collection inside running instance on the fly , unfortunately the wiredTiger do not allow it , but offer other advantages.