How do I properly snapshot an EBS volume with a RabbitMQ instance running?

I'm using RabbitMQ on an EC2 instance and I have the Mnesia tables on an EBS volume, so when I snapshot it and try to launch another instance with the same data, it appears that the table is in use by another RabbitMQ instance.

Is the only way to get around this to shut RabbitMQ down for the flush/snapshot and then start it back up once it's done?

Is there a way to clean up the files so that they don't appear locked or are forcefully unlocked?

It's not a common problem I'll be facing, just curious if there's a better solution.

To clarify, the error I see is: timeout_waiting_for_tables.

Solution

You first have the filesystem to be concerned with. Not sure if you're using LVM, ext3, xfs or what, but if you're on LVM, you might want to checkout the dmsetup man page; specifically dmsetup suspend / resume

You will end up with something like:

dmsetup suspend <dev> 
ec2-create-snapshot <vol> 
dmsetup resume <dev>

Once you've got the filesystem sync'ing / suspending, there's rabbitmq to worry about. Rabbitmq developer Matthias Radestock states in an email thread:

But for the persistant messages I am not so sure. How is the rabbit_persister.LOG managed? Can I just take a backup copy of it whenever, or may I only take one of the rabbit_persister.LOG.previous?

I'd back up both files. It should be possible to restore the rabbit_persister.LOG from a backup even when that backup was taken in the middle of an append - I haven't tested that though. The .previous log is needed in case the backup takes place while the log is being rolled.

Where would I look to find out how often it is rolled?

The logic for deciding when to roll the log is rather complex.

Can I trigger a manual roll?

rabbit__persister:force_snapshot() in the Erlang shell does the trick.

Checkout the force-snapshot target in the Rabbitmq Makefile.