Here is content of MongoDB
/data/db
folder content. Where one of the collections is 723GB. And other collections just few KB.
-rw------- 1 lxd docker 723G Dec 5 10:15 collection-0-1080408413244540209.wt
-rw------- 1 lxd docker 36K Dec 5 10:15 collection-0-3112968025499504303.wt
-rw------- 1 lxd docker 36K Dec 5 10:15 collection-2-1080408413244540209.wt
-rw------- 1 lxd docker 4.0K Dec 5 10:14 collection-4-1080408413244540209.wt
-rw------- 1 lxd docker 20K Dec 5 10:15 collection-7-1080408413244540209.wt
-rw------- 1 lxd docker 0 Dec 5 10:14 .dbshell
drwx------ 2 lxd docker 90 Dec 5 10:15 diagnostic.data
-rw------- 1 lxd docker 8.1G Dec 5 10:15 index-1-1080408413244540209.wt
-rw------- 1 lxd docker 20K Dec 5 10:14 index-1-3112968025499504303.wt
-rw------- 1 lxd docker 36K Dec 5 10:15 index-3-1080408413244540209.wt
-rw------- 1 lxd docker 4.0K Dec 5 10:14 index-5-1080408413244540209.wt
-rw------- 1 lxd docker 4.0K Dec 5 10:14 index-6-1080408413244540209.wt
-rw------- 1 lxd docker 20K Dec 5 10:14 index-8-1080408413244540209.wt
-rw------- 1 lxd docker 20K Dec 5 10:15 index-9-1080408413244540209.wt
drwx------ 2 lxd docker 110 Dec 5 10:15 journal
-rw------- 1 lxd docker 36K Dec 5 10:15 _mdb_catalog.wt
-rw------- 1 lxd docker 0 Dec 5 10:15 mongod.lock
-rw------- 1 lxd docker 36K Dec 5 10:15 sizeStorer.wt
-rw------- 1 lxd docker 114 Dec 5 10:14 storage.bson
-rw------- 1 lxd docker 50 Dec 5 10:14 WiredTiger
-rw------- 1 lxd docker 4.0K Dec 5 10:15 WiredTigerHS.wt
-rw------- 1 lxd docker 21 Dec 5 10:14 WiredTiger.lock
-rw------- 1 lxd docker 1.5K Dec 5 10:15 WiredTiger.turtle
-rw------- 1 lxd docker 84K Dec 5 10:15 WiredTiger.wt
To resolve that problem, I simply added 723GB collection into rsync exception and do not upload it anymore. Is that fine? May I after 1 year still use my backup to restore the server if I do not update collection-0-1080408413244540209.wt
file any more?
By default, rsync only copies new or changed files from a source to destination so you dont need to add the file to exception list if you copy the mounted snapshot files. From the other hand the wiredTiger is a bit sensitive and generate checksum in the data root folder based on checkpoints from all collections so in case the file differ there is a big chance the mongod process to not be able to start from the restored snapshot. So I would suggest to not exclude the file completely but leave to rsync to check every time if file is same or differ and need to be copied again or not.
P.S. Note that the scenario you have described was valid with the deprecated previous mongoDB storage engine mmapv1 where it was even possible to copy the collection inside running instance on the fly , unfortunately the wiredTiger do not allow it , but offer other advantages.