Search code examples
djangoceleryapi-designtask-queue

Should I use a task queue (Celery), ayncio or neither for an API that polls other APIs?


I have written an API with Django which purpose is to operate as a bridge between a website back-end and external services we use, so that the website doesn't have to handle many requests to external APIs (CRM, calendar events, email providers etc.).

The API mainly polls other services, parses the results and forwards them to the website backend.

I initially went for a Celery-based task queue, as it seemed to me like the right tool to offload that processing to another instance, but I'm starting to think it doesn't really fit the purpose.

As the website expects synchronous responses, my code contains a lot of :

results = my_task.delay().get()

or

results = chain(fetch_results.s(), parse_results.s()).delay().get()

Which doesn't feel like the proper way to use Celery tasks.

It is efficient when pulling dozens of requests and processing the results in parallel - a periodic refresh task for example - but adds a lot of overhead for simple requests (fetch - parse - forward), which represent most of the traffic.

Should I go full synchronous for those "simple requests" and keep Celery tasks for specific scenarios ? Is there an alternative design (maybe involving asyncio) that would better suit the purpose of my API ?


Using Django, Celery (w/ Amazon SQS) on an EBS EC2 instance.


Solution

  • You could consider using Gevent with your Django webserver to allow it to operate efficiently for the "simple requests" you've mentioned without being blocked. If you proceed with this approach, be sure to pool database connections with PgBouncer or Pgpool-II or a Python library since each greenlet will make its own connection.

    Once you've implemented that, it's possible to also use Gevent instead of Celery to handle asynchronous processing by joining on multiple Greenlets that each make an external API request, rather than incur the overhead of passing messages to an external celery worker.

    Your implementation is similar to what we've done at Kloudless, which provides a single API to access multiple other APIs, including CRM, calendar, storage, etc.