Search code examples
pythonamazon-web-servicesboto3

Retrieve metadata for last runs of multiple job definitions using a single API call in AWS Glue Boto3


I'm working with the AWS Glue API using Boto3, specifically the get_job_runs method as documented here. This method allows me to retrieve metadata for all runs of a given job definition.

However, I have a scenario where I need to retrieve metadata for the last runs of multiple job definitions, and I'm looking for a way to achieve this with a single API call. Currently, I'm making separate API calls for each job definition, which is not very efficient.

Is there a way to optimize this process and retrieve the metadata for the last runs of multiple job definitions using a single API call? Any guidance or code examples using Boto3 would be greatly appreciated. Thank you!


Solution

  • Update: Depending on the number of jobs, this may exceed the request quota and get throttled. In this case, the Map state can be configured to automatically retry. This won't reduce the execution time, but you don't need to take care of error handling yourself. And, compared to a Lambda function, this is much more cost-effective.


    At this time (August 2023), the service doesn't provide an API that aggregates the run data for multiple jobs.

    However, to reduce the execution time, consider using AWS Step Functions, specifically the Map state. It allows you to:

    1. Take your list of job names
    2. Directly call the AWS Glue GetJobRuns API for each job in parallel
    3. Aggregate the responses into a JSON array

    Because Step Functions can directly integrate with the Glue API, you don't even need to write any custom code. All it needs is an array of job names.