Search code examples
pythondjangomultiprocessingceleryconcurrent.futures

Celery vs. ProcessPoolExecutor / ThreadPoolExecutor


I am creating a django webserver that allows the user to run some "executables" on a local machine and to analyse their output through a webpage.

I have previously used a Celery tasks queue in order to run "executables" in similar situations. However, after reading up on Python concurrent.futures, I am beginning to wonder if I should use ThreadPoolExecutor, or ProcessPoolExecutor (or ThreadPoolExecutor inside a ProcessPoolExecutor :D) instead?

Googling I could only find one relevant question comparing Celery to Tornado, and it steered to using Tornado alone.

So should I use Celery or a PoolExecutor for my simple webserver, and why?


Solution

  • You need to use celery if:

    1. You want to scale easily and independently from your webserver
    2. You want a way to monitor your task and retry them if they fail
    3. You want to create more advanced task execution patterns (ex. chain them)

    In addition to this is a very mature library with side projects that helps you also on UI presentation side, have a look at Jobtastic.

    If you don't need any of the listed point and you just need to execute this task without caring to much about status and without particular needs of scalability than just keep it simple.

    About using ThreadPoolExecutor or ProcessPoolExecutor just keep in mind that the second will be able to receive and return only pickable objects and that the first will spawn child thread attached to your main process (probably your webserver if you are not using it inside another detached process) so the approach of mix them can make sense depending on the details of your implementation.