I'm running Celery in a Django app with RabbitMQ as the message broker. However, RabbitMQ keeps breaking down like so. First is the error I get from Django. The trace is mostly unimportant, because I know what is causing the error, as you will see.
Traceback (most recent call last):
...
File "/usr/local/lib/python2.6/dist-packages/amqplib/client_0_8/transport.py", line 85, in __init__
raise socket.error, msg
error: [Errno 111] Connection refused
I know that this is due to a corrupt rabbit_persister.log file. This is because after I kill all processes tied to RabbitMQ, I run "sudo rabbitmq-server start" to get the following crash:
...
starting queue recovery ...done
starting persister ...BOOT ERROR: FAILED
Reason: {{badmatch,{error,{{{badmatch,eof},
[{rabbit_persister,internal_load_snapshot,2},
{rabbit_persister,init,1},
{gen_server,init_it,6},
{proc_lib,init_p_do_apply,3}]},
{child,undefined,rabbit_persister,
{rabbit_persister,start_link,[]},
transient,100,worker,
[rabbit_persister]}}}},
[{rabbit_sup,start_child,2},
{rabbit,'-run_boot_step/1-lc$^1/1-1-',1},
{rabbit,run_boot_step,1},
{rabbit,'-start/2-lc$^0/1-0-',1},
{rabbit,start,2},
{application_master,start_it_old,4}]}
Erlang has closed
My current fix: Every time this happens, I rename the corresponding rabbit_persister.log file to something else (rabbit_persister.log.bak) and am able to restart RabbitMQ with success. But the problem keeps occurring, and I can't tell why. Any ideas?
Also, as a disclaimer, I have no experience with Erlang; I'm only using RabbitMQ because it's the broker favored by Celery.
Thanks in advance, this problem is really annoying me because I keep doing the same fix over and over.
The persister is RabbitMQ's internal message database. That "log" is presumably like a database log and deleting it will cause you to lose messages. I guess it's getting corrupted by unclean broker shutdowns, but that's a bit beside the point.
It's interesting that you're getting an error in the rabbit_persister
module. The last version of RabbitMQ that has that file is 2.2.0, so I'd strongly advise you to upgrade. The best version is always the latest, which you can get by using the RabbitMQ APT repository. In particular, the persister has seen a fairly large amount of fixes in the versions after 2.2.0, so there's a big chance your problem has already been resolved.
If you still see the problem after upgrading, you should report it on the RabbitMQ Discuss mailing list. The developers (of both Celery and RabbitMQ) make a point of fixing any problems reported there.