AWS Glue: get job_id from within the script using pyspark

I am trying to access the AWS ETL Glue job id from the script of that job. This is the RunID that you can see in the first column in the AWS Glue Console, something like jr_5fc6d4ecf0248150067f2. How do I get it programmatically with pyspark?

Solution

As it's documented in https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-get-resolved-options.html, it's passed in as a command line argument to the Glue Job. You can access the JOB_RUN_ID and other default/reserved or custom job parameters using getResolvedOptions() function.

import sys
from awsglue.utils import getResolvedOptions

args = getResolvedOptions(sys.argv)
job_run_id = args['JOB_RUN_ID']

NOTE: JOB_RUN_ID is a default identity parameter, we don't need to include it as part of options (the second argument to getResolvedOptions()) for getting its value during runtime in a Glue Job.