Search code examples
amazon-web-servicesapache-sparkapache-spark-sqlaws-glue

Maximum number of concurrent tasks in 1 DPU in AWS Glue


A standard DPU in AWS Glue comes with 4 vCPU and 2 executors. I am confused about the maximum number of concurrent tasks that can be run in parallel with this configuration. Is it 4 or 8 on a single DPU with 4vcpu and 2 executors?


Solution

  • I had a similar discussion with the AWS Glue support team about this, I'll share with you what they told me about Glue Configuration. Take in example the Standard and the G1.X configuration.

    Standard DPU Configuration:

    • 1 DPU reserved for MasterNode
    • 1 executor reserved for Driver/ApplicationMaster
    • Each DPU is configured with 2 executors
    • Each executor is configured with 5.5 GB memory
    • Each executor is configured with 4 cores

    G.1X WorkerType Configuration:

    • 1 DPU added for MasterNode
    • 1 DPU reserved for Driver/ApplicationMaster
    • Each worker is configured with 1 executor
    • Each executor is configured with 10 GB memory
    • Each executor is configured with 8 cores

    If we have for example a Job with Standard Configuration with 21 DPU means that we have:

    • 1 DPU reserved for Master
    • 20 DPU x 2 = 40 executors
    • 40 executors - 1 Driver/AM = 39 executors

    Which we then end up with a total amount of 156 cores. Meaning, your job has 156 slots for execution. If for example you read files from S3 that means that you will be able to accept 156 input files in parallel.

    Hope it helps.