Search code examples
amazon-emremr-serverless

How can I pass environment variable to project which run on EMR Serverless?


In my PySpark project I'm using a python package that uses Dynaconf so I need to set the following environment variable - ENV_FOR_DYNACONF = platform. The problem is I don't understand how can I pass this environment variable to the EMR Serverless job run.

I've tried this -

os.environ['ENV_FOR_DYNACONF'] = platform

At the beginning of the code, but it didn't work and in any case, I want to understand what is the right way to pass env variables to the EMR.

Can anyone help?


Solution

  • To pass environment variables in EMR Serverless, you can use the following spark job properties in sparkSubmitParameters :

    • spark.executorEnv.[KEY] to pass env variables to the executors.
    • spark.emr-serverless.driverEnv.[KEY] to pass env variables to the driver.

    Example :

    "sparkSubmitParameters": "--conf spark.emr-serverless.driverEnv.ENV_FOR_DYNACONF=platform"
    

    For more information, refer to https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/jobs-spark.html