Search code examples
djangoarchitecturerabbitmqcelerymultiserver

Django + RabbitMQ + Celery all on different machines (Servers)


I managed to get Django and RabbitMQ and Celery work on single machine. I have followed instructions from here. Now I want to make them work together but in situation when they are on different servers. I do not want Django knows anything about Celery nor Celery about Django.

So, basically I just want in Django to send some message to RabbitMQ queue (probably id, type of task, maybe some other info), and then I want RabbitMQ to publish that message (when its possible) to Celery on another server. Celery/Django should not know about each other, basically I want architecture where it is easy to replace any one of them.

Right now I have in my Django several calls like

create_project.apply_async(args, countdown=10)

I want to replace that with similar calls directly to RabbitMQ (as I said Django should not depend on Celery). Then, RabbitMQ should notify Celery (when it is possible) and Celery will do its job (probably interact with Django but through REST interface).

Also, I have need to have Celery workers on two or more servers and I want RabbitMQ to notify only one of them depending on some field in message. If this is to complicated I could just check in every task (on different machines) something like: is this is something you should do (like checking ip address field in message) and if its not than just stop with execution of task.

How can I achieve this? if possible I would prefer code + configuration examples not just theoretical explanation.

Edit:

I think that for my use case celery is total overhead. Simple RabbitMQ routing with custom clients will do the job. I already tried simple use case (one server) and it works perfectly. It should be easy to make communication multi-server ready. I do not like celery. It is "magical", hides too many details and it is not easy to configure. But I will leave this question alive, because I am interested in others opinions.


Solution

  • The short of it

    How can I achieve this?

    Celery only sends the task name and a serialized set of parameters as the message body. That is your scenario is absolutely in line with how Celery operates.

    if possible I would prefer code + configuration examples not just theoretical explanation.

    For the client app, i.e. your Django app, define stub tasks, like so:

    @task
    def foo():
        pass
    

    For the Celery processing, on your remote server, define the actual tasks to be executed.

    @task
    def foo():
        pass
    

    It is important that the tasks live in the same Python package in both sides (i.e. app.tasks.py, otherwise Celery won't be able to match the message to the actual task.

    Note that this also means your Django app becomes untestable if you have set CELERY_ALWAYS_EAGER=True, unless you make the Celery apps's tasks.py available locally to the Django app.

    Even Simpler Alternative

    An alternative to the above stub tasks is to send tasks by name:

    >>> app.send_task('tasks.add', args=[2, 2], kwargs={})
    <AsyncResult: 373550e8-b9a0-4666-bc61-ace01fa4f91d>
    

    On Message Patterns

    Also, I have need to have Celery workers on two or more servers and I want RabbitMQ to notify only one of them depending on some field in message.

    RabbitMQ offers several messaging patterns, their tutorials are quite well written and to the point. What you want (one message processed by one worker) is trivially achieved with a simple queue/exchange setup, which (with Celery at least) is the default if you don't do anything else. If you need specific workers to attend to specific tasks/respond to specific messages, use Celery's task routing which works hand-in-hand with RabbitMQ's concept of queues and exchanges.

    Trade-Offs

    I think that for my use case celery is total overhead. Simple RabbitMQ routing with custom clients will do the job. I already tried simple use case (one server) and it works perfectly.

    Of course, you may use RabbitMQ out of the box, at the cost of having to deal with the lower-level API that RabbitMQ provides. Celery adds a task abstraction that makes it very straight forward to build any producer/consumer scenario, essentially using just plain Python functions or methods. Note that this is not a better/worse judgement of either RabbitMQ or Celery -- as always with engineering decisions, there is trade-off involved:

    • If you use Celery, you probably loose some of the flexibility of the RabbitMQ API, but you gain ease of development, while gaining in development speed and a lower deployment complexity -- it basically just works.

    • If you use RabbitMQ directly, you gain flexibility, but with this comes deployment complexity that you need to manage yourself.

    Depending on the requirements of your project, either approach may be valid - your call, really.

    Any sufficiently advanced technology is indistinguishable from magic ;-)

    I do not like celery. It is "magical", hides too many details and it is not easy to configure.

    I would choose to disagree. It may be "magical" in Arthur C. Clarke's sense, but it certainly is rather easy to configure if you compare it to a plain RabbitMQ setup. Of course if you're also the guy who does the RabbitMQ setup, it may just add a layer of abstraction that you don't really gain anything from. Maybe your developers will?