Search code examples
amazon-web-servicesaws-glue

AWS Glue: get job_id from within the script using pyspark


I am trying to access the AWS ETL Glue job id from the script of that job. This is the RunID that you can see in the first column in the AWS Glue Console, something like jr_5fc6d4ecf0248150067f2. How do I get it programmatically with pyspark?


Solution

  • As it's documented in https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-get-resolved-options.html, it's passed in as a command line argument to the Glue Job. You can access the JOB_RUN_ID and other default/reserved or custom job parameters using getResolvedOptions() function.

    import sys
    from awsglue.utils import getResolvedOptions
    
    args = getResolvedOptions(sys.argv)
    job_run_id = args['JOB_RUN_ID']
    

    NOTE: JOB_RUN_ID is a default identity parameter, we don't need to include it as part of options (the second argument to getResolvedOptions()) for getting its value during runtime in a Glue Job.