I am using Open edX that uses MongoDB to store courses. we are running three-node replica set. It is currently using split mongo - a feature that makes a copy of the current document (backup) before editing. As time passes by, this piles up, resulting in the consumption of large disk space. There are currently around 30 courses and when I export it, it consumes around 2-3 GB. However, the disk space it is actually using is
I tried to clean the unwanted courses using this script
Upon executing this in primary member, it takes some time and deletes all the unwanted documents. But it does not release the disk space.
rs0:SECONDARY> db.stats()
{
"db" : "edxapp",
"collections" : 5,
"objects" : 277557,
"avgObjSize" : 112645.21484235671,
"dataSize" : 31265467896,
"storageSize" : 57843929088,
"numExtents" : 0,
"indexes" : 6,
"indexSize" : 6938624,
"ok" : 1
}
root@mongo:~# df
Filesystem 1K-blocks Used Available Use% Mounted on
udev 4082828 12 4082816 1% /dev
tmpfs 817564 396 817168 1% /run
/dev/xvda1 8115168 1805528 5874364 24% /
none 4 0 4 0% /sys/fs/cgroup
none 5120 0 5120 0% /run/lock
none 4087804 0 4087804 0% /run/shm
none 102400 0 102400 0% /run/user
/dev/xvdf 62904320 57542660 5361660 92% /edx
/dev/xvdh 72117576 53012 68378164 1% /tmp/repairdb
I tried to compact the DB using
rs0:SECONDARY> db.runCommand( { compact : 'modulestore.structures', force: 'true' } )
{ "ok" : 1 }
It didn't help either.
Could someone please let me know how to reclaim the disk space in such a situation? I want to do this in prod server as fast as I can.
You need to do the initial sync. One secondary at the time and finally step down your primary and do an initial sync on that too.
So, you stop secondary and then remove all files from nodes dbPath. Start node and let it do the initial sync. Repeat this to all nodes.