Search code examples
pythonamazon-web-servicesaws-glue

AWS Glue - resources distribution when running concurrently


Background: Step Functions workflow queries DynamoDB to get number of tables and their name whose load failed last time.

Based on number of tables, Map state run n times one glue job (single job with python script). Each run has different input parameter (table name) that is passed by Map state.

In Glue's Run tab I'm seeing n number of runs, each having different input value which is expected result.

Question: are these job runs (each has unique ID) done on a separate virtual resources or do they share one computing virtual resources when run concurrently?


Solution

  • When you let a Glue job to have concurrent runs then the resources (DPUs) used are always the No. of concurrent runs * the DPU's configured for each run.

    If you have a Glue job configured with 5 DPUs and you are allowing it to have 3 concurrent runs. Now when the job has 2 concurrent runs the DPUs used are 5 DPUs * 2 runs = 10 DPUs and if it's 3 runs then it will be 5 DPUs * 3 runs = 15 DPUs used.