I'm started to analyze how MongoDB works on Amazon AWS and I feel like I'm missing something fundamental here. From what I'v read on Amazon Storage docs, it looks like Amazon does some backup'ing of their hardware disks automatically. So, if they are able transparently to restore every disk (which stores MongoDB data) then do I still need to care about backup and recovery?
I'm mostly interested disaster or failure recovery issues however its related to hardware failure and it's unclear is Amazon already handling that automatically (using disk mirroring or predefined backuping schedules), or we still need to perform it manually (lock, backup, and then restore some day)? If not then what happens when some disk fails on AWS? Does the data gets corrupted (website gets corrupted and partly functional), we get an email from AWS at night and then we need immediately restore (after receiving an email) database in the morning? :)
I think your analysis is based on wrong, if not dangerous assumptions. Some basics:
Here is how I do it. Simplified, as a detailed analysis requires to know the use case, the direct and indirect costs per h of downtime and quite some other factors.
A small comparison between the mentioned backup methods:
A nifty tool, which allows you to create the backup of a remote machine, which is an advantage, as you do not have to move the data from the data bearing machine manually and you don't need to provision additional disk space on that machine. It's drawbacks are that restores are pretty slow. It is suggested by MongoDB to only use mongodump on small databases, which I can only second. As for defining small, I personally draw a line at roughly 1GB.
When done right, this method is extremely flexible – you can take consistent backups of both your MongoDB data and your other application data such as files for example in a single step, create a compressed tar
file from it and store it at an off site location by means of pretty simple shell scripts. The drawbacks are that you need to over provision your disks, compression takes time and ressources as well and you need to have some knowledge about what you are doing.
This is the Ferrari of backup methods for MongoDB – it offers real time backup and according point in time recovery, is extremely simple to set up and recover... However, it comes with a rather big price tag, even more so in AWS, as the data is sent (encrypted, of course) to MMS, which should count as external traffic. However, there are still use cases where I would advice to use MMS on AWS: anything which is directly related to financial transactions (in the business meaning) or with extremely tight SLAs should use MMS, as it offers real point-in-time recovery.