Search code examples
amazon-ecsdagster

Dagster: Why do I need to specify ECS cluster name, and separately its CPU/memory requirements?


In Dagster cloud's documentation, for AWS and under Per-deployment configuration, I see that I need to specify both:

  1. Cluster name (Under "cluster")
  2. Requested server resources (Under server_resources. Namely CPU and memory)

Since as far as I understood, an ECS cluster contains homogenous machines, wouldn't the cluster name indicate the machines available?

What would happen if I requested resources of type X, while the cluster contained resources of (a different) type Y?


Solution

  • ECS cluster contains homogeneous machines

    A EC2 cluster doesn't have to use the same machine type for all its capacity.

    Each cluster can have one or more capacity providers. The two main capacity provider types are Fargate and EC2.

    Fargate is always available to all AWS accounts. AWS manages the Fargate service for us. ECS exposes no provisioning details about it. Fargate itself may use multiple machine types and we don't know.

    EC2 is basically an auto-scaling group. You can use multiple instance types within a single auto-scaling group. Even if one auto-scaling group uses only one machine type, you can add another auto-scaling group with a different type.

    So there are three ways to have heterogeneous capacity in the cluster:

    • Use both the Fargate and EC2 capacity providers
    • Use the EC2 capacity provider with an auto-scaling group with a mixed instances policy
    • Use the EC2 capacity provider with many auto-scaling groups with different launch configurations

    Wouldn't the cluster name indicate the machines available?

    Yes, but a task definition doesn't associate it directly with a machine, and it doesn't have to use any EC2 machine at all.

    In any case a machine can run multiple tasks.

    A task has capacity provider strategy. It may specify either Fargate or EC2 but not both.

    If the task uses Fargate, then there are no available machines, just a pool from which ECS allocates resources.

    If the task uses EC2, then it uses one or more auto-scaling groups. ECS calls the machines in these groups container instances.

    ECS uses task placement to decide how to run a task on an EC2 machine. I think each task must run on a single machine, but I can't find that in the documentation. A container certainly must run on a single machine, but a task can have multiple containers.

    What would happen if I requested resources of type X, while the cluster contained resources of type Y?

    The task definition documentation explains this behavior.

    We don't know Y for Fargate. You choose X from list of supported combinations of CPU and memory. If you choose a bad combination, I suppose the task fails (I haven't tested and I can't find a description in the documentation).

    For EC2, as long as X is less than Y, then your task probably can run. If your cluster doesn't have any machines with the requested memory or CPU, then the task fails.


    The EC2 behavior is analogous to to running a Docker container on your own development machine. A container doesn't use all the resources of your computer. The smaller the containers, the more you can run on one machine at a time. Docker allows you to set the maximum memory and CPU allocation for a container via runtime resource constraints.