Search code examples
mongodbamazon-web-servicesamazon-ec2backuprecovery

How frequent we need to perform MongoDB backups on AWS?


I'm started to analyze how MongoDB works on Amazon AWS and I feel like I'm missing something fundamental here. From what I'v read on Amazon Storage docs, it looks like Amazon does some backup'ing of their hardware disks automatically. So, if they are able transparently to restore every disk (which stores MongoDB data) then do I still need to care about backup and recovery?

I'm mostly interested disaster or failure recovery issues however its related to hardware failure and it's unclear is Amazon already handling that automatically (using disk mirroring or predefined backuping schedules), or we still need to perform it manually (lock, backup, and then restore some day)? If not then what happens when some disk fails on AWS? Does the data gets corrupted (website gets corrupted and partly functional), we get an email from AWS at night and then we need immediately restore (after receiving an email) database in the morning? :)


Solution

  • I think your analysis is based on wrong, if not dangerous assumptions. Some basics:

    1. Backup intervals are determined first and foremost by the acceptable loss of data in a worst case scenario.
    2. Means of ensuring data availability provided by AWS (or MongoDB for that matter) are no replacement for backups. Disk mirroring does not help if data is lost due to DBA errors, for example.
    3. Backups intervals and methods should reflect your (internal?) SLAs.

    Here is how I do it. Simplified, as a detailed analysis requires to know the use case, the direct and indirect costs per h of downtime and quite some other factors.

    1. Find out the turnover / h.
    2. Find as much recovery methods as possible. For MongoDB, the most prominent are mongodump (which I rarely use, and if, only for very small databases), disk snapshots (I prefer using LVM for these), and MMS backups.
    3. Develop the most time efficient recovery plan for each of the methods you chose.
    4. Test those plans with worst case scenarios (total loss of data, both MongoDB's and – if applicable – other application data), refine them if necessary.
    5. Choose the one with the best balance between recovery time (take your SLAs into account) and acceptable costs. Acceptable costs/year are the fraction of your turnover you are willing to spend for backups, plus the estimated downtime (be conservative, I usually modify the current value with 1.5 at least) including recovery in h/year multiplied with turnover/h. Keep in mind that using replica sets and load balanced frontends may drastically reduce your overall downtime, while providing other benefits.

    A small comparison between the mentioned backup methods:

    mongodump

    A nifty tool, which allows you to create the backup of a remote machine, which is an advantage, as you do not have to move the data from the data bearing machine manually and you don't need to provision additional disk space on that machine. It's drawbacks are that restores are pretty slow. It is suggested by MongoDB to only use mongodump on small databases, which I can only second. As for defining small, I personally draw a line at roughly 1GB.

    LVM snapshots

    When done right, this method is extremely flexible – you can take consistent backups of both your MongoDB data and your other application data such as files for example in a single step, create a compressed tar file from it and store it at an off site location by means of pretty simple shell scripts. The drawbacks are that you need to over provision your disks, compression takes time and ressources as well and you need to have some knowledge about what you are doing.

    MMS backup

    This is the Ferrari of backup methods for MongoDB – it offers real time backup and according point in time recovery, is extremely simple to set up and recover... However, it comes with a rather big price tag, even more so in AWS, as the data is sent (encrypted, of course) to MMS, which should count as external traffic. However, there are still use cases where I would advice to use MMS on AWS: anything which is directly related to financial transactions (in the business meaning) or with extremely tight SLAs should use MMS, as it offers real point-in-time recovery.