Working Around Concurrency Limits in AWS Glue

I have a question around how best to manage concurrent job instances in AWS glue.

I have a job defined like so:

job = client.create_job(
        Name='JOB_NAME', 
        Role='the-role-name',
        ExecutionProperty={
            'MaxConcurrentRuns': 25
        },
        Command={'Name': 'glueetl',
                 'ScriptLocation': script_location,
                 'PythonVersion': '3'},
        Tags={'Application': 'app',
              'Project': 'proj'},
        GlueVersion='2.0',
        WorkerType='G.2X',
        NumberOfWorkers=50
    )

I want to call about 1000 instances of this job like so:

def run_job(f):
    response = client.start_job_run(
                JobName = JOB_NAME,
                Arguments = {
                    '--start_date':  start_date,
                    '--end_date':  end_date,
                    '--factor':  f} )
    return response


for f in factors:
        response = run_job(f)
        print(f"response: {response}")

The issue with this approach is #1 firing off all these requests at once will throw a throttling error and #2 if I sleep between job starts I still run up against concurrency limit which is 50.

Does anyone know an easy way to work around these issues?

Solution

The "Max concurrent job runs per account" limit is a soft limit (https://docs.aws.amazon.com/general/latest/gr/glue.html). Maybe log a service request with AWS and ask for an increase in the limit. The second thing is I am not sure how you have implemented your sleep action in the code, maybe instead of doing just a sleep catch the exception each time you make the call, if there is an exception, sleep with an exponential backoff in seconds and try again when sleep time is finished and repeat until your get a positive response OR when you reach your own set limit to stop. This way your processing will not stop until you give up, but just slow down when throtteling kicks in.