Search code examples
amazon-ec2rabbitmqmnesia

Migrating data from RabbitMQ server


We have RabbitMQ server hosted on AWS and recently we received notice that instance will be under maintenance and will become temporary unavailable for few hours. As it is production server we want to avoid downtime for our users and currently thinking about strategies to migrate RabbitMQ to another server without loosing data. It looks like there are two options:

  1. Try to connect other nodes from different machines and replicate data to them.
  2. Install rabbit on new machine and copy mnesia files from old server to new one. Switch on new server, switch off old one. E.G. It is possible to do image snapshot on AWS which can simplify process.

I was not able to find a way to implement (1) without cleaning data thus this option does not look workable. As for (2) it looks like very manual and creepy. Are there any other data migration strategies or am I missing something here?


Solution

  • I managed to set up flow for 1st option to replicate data without down time by setting up RabbitMQ cluster. To do that I followed manual, but tweaked two things to make it work for my stack:

    1. RabbitMQ cluster in AWS does not work with ip addresses as fqdn short names so to make cluster machines to see each other you need to edit /etc/hosts file and rearrange "string" names to your cluster machines:

    vi /etc/hosts

    File should look like something like this:

    127.0.0.1 localhost 10.242.86.191 ip-10-242-86-191 ::1 ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters ff02::3 ip6-allhosts

    1. After setting up cluster you need to set up replication as described here but important fact is that queues by default are not replicated. So you need to set up policy for queue replication like that:

    rabbitmqctl set_policy policy_name "queue_pattern" '{"ha-mode":"all", "ha-sync-mode":"automatic"}' -p your_vhost

    By the way -p your_vhost parameter is not mentioned in docs - be carefull to specify vhost if you use any.

    After setting everything up queues replicated across the cluster and syncronized via mnesia, that enabled me to switch first cluster machine off without downtime and switch on after maintance without lost of data.