Search code examples
djangodockerceleryamazon-ecsaws-fargate

Is anyone implemented django celery worker as docker container it only runs when task assigned


I am able to successfully deploy Django Celery worker as a docker container in AWS ECS service using FARGATE as computing.

But my concern is that celery container is running 24/7. If I can run container only when task is assigned, I can save lot of money as per AWS FARGATE billing methodology.


Solution

  • Celery isn't really the right thing to use because it's designed to persist, but the goal should be reasonably easy to achieve.

    Architecturally, you probably want to run a script on a Fargate task. The script chews through the queue and then dies. You'd trigger that task somehow:

    • An API call from your data receiver (e.g. Django)
    • A lambda function (triggered by what?)

    Still some open questions... do you limit yourself to one task at a time or do you need to manage concurrent requests to the queue? Do you retry? But a plausible place to start.


    A not-recommended but perhaps easier way to do it would be to run a celery worker in your Django container (e.g. using supervisor) and use Fargate's autoscaling features. You'd always have the one Django container running to receive data. If the celery worker on that container used up all of the available resources, Fargate would scale the service by adding tasks. Once the jobs were done, it'd remove the excess containers. You'd be paying the "overhead" for Django in each container, but it could cost you less than an always-on celery container and would certainly be simpler -- leverage your celery experience and avoid the extra layer of event handling.

    EDIT: Another disadvantage of this version is that you need to run Redis somewhere and I've found the minimum cost for this to be relatively high.


    Based on my growing AWS experience, here's what you probably should do...

    1. Use AWS API Gateway as an always-on receiver of events/requests. You only pay for requests, the free tier includes a million per month, and the next 300M are $1 (pricing) so this is likely to be free.
    2. While you have many options for responding to the request, an AWS Lambda function (which can be written in python) should have the least overhead.
    3. If your queue will run longer than a Lambda function allows (15 minutes), you'll need to have that Lambda function delegate the processing to e.g. a Fargate task.
    4. (Optional) If you want to user a Dockerhub container for your Fargate task, we experienced a bunch of issues with Tasks and Services failing to start due to rate limits at Dockerhub. We ended up wrapping our Fargate task in a Step Function that checked for this error specifically and retried.
    5. (Optional) If you need to limit concurrency, this SO answer suggests having your Lambda function check for an existing execution (of a Step Function or Fargate task). I was hoping there was something native on Fargate Tasks or Step Functions but I don't see anything.

    I imagine this would represent a huge operating cost savings over the always-on Fargate task and Elasticache Redis queue, but the up-front cost/hassle could exceed the savings.