Search code examples
pythonamazon-web-servicesceleryamazon-ecsaws-fargate

Operating the Celery Worker in the ECS Fargate


I am working on a project using AWS ECS. I want to use Celery as a distributed task queue. Celery Worker can be build up as EC2 type, but because of the large amount of time that the instance is in the idle state, I think it would be cost-effective for AWS Fargate to run the job and quit immediately.

Do you have suggestions on how to use the Celery Worker efficiently in the AWS cloud?


Solution

  • Fargate launch type is going to take longer to spin up than EC2 launch type, because AWS is doing all the "host things" for you when you start the task, including the notoriously slow attaching of an ENI, and likely downloading the image from a Docker repo. Right now there's no contest, EC2 launch type is faster every time.

    So it really depends on the type of work you want the workers to do. You can expect a new Fargate task to take a few minutes to enter a RUNNING state for the aforementioned reasons. EC2 launch, on the other hand, because the ENI is already in place on your host and the image is already downloaded (at best) or mostly downloaded (likely worst), will move from PENDING to RUNNING very quickly.

    Edit: As @Rocket04 points out in a comment below, it appears AWS has improved Fargate startup times for scaling applications. Hooray!


    Use EC2 launch type for steady workloads, use Fargate launch type for burst capacity

    This is the current prevailing wisdom, often discussed as a cost factor because Fargate can't take advantage of the typical EC2 cost savings mechanisms like reserved instances and spot pricing. It's expensive to run Fargate all the time, compared to EC2.

    To be clear, it's perfectly fine to run 100% in Fargate (we do), but you have to be willing to accept the downsides of doing that - slower scaling and cost.

    Note you can run both launch types in the same cluster. Clusters are logical anyway, just a way to organize your resources.


    Example cluster

    This example shows a static EC2 launch type service running 4 celery tasks. The number of tasks, specs, instance size and all doesn't really matter, do it up however you like. The important thing is - EC2 launch type service doesn't need to scale; the Fargate launch type service is able to scale from nothing running (during periods where there's little or no work to do) to as many workers as you can handle, based on your scaling rules.

    EC2 launch type Celery service

    Running 1 EC2 launch type t3.medium (2vcpu/4GB).

    Min tasks: 2, Desired: 4, Max tasks: 4

    Running 4 celery tasks at 512/1024 in this EC2 launch type.

    No scaling policies

    Fargate launch type Celery service

    Min tasks: 0, Desired: (x), Max tasks: 32

    Running (x) celery tasks (same task def as EC2 launch type) at 512/1024

    Add scaling policies to this service