I'm trying to use Glue ETL as a job scheduler for my Python script which also references a JSON config file.
According to https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html, there is a parameter called --extra-files
which is said to be an S3 path to additional files like configuration files. I can't seem to find this on the console when I create my job.
What I've done is upload my config file to the same S3 bucket as my python script for Glue ETL, which I include in the Referenced files path
parameter.
Within my script, I refer to my config file as:
with open('config.json', 'r') as config:
config = json.load(config)
There aren't any issues with the logic of my code as it all works fine when run locally.
However, when I try to run the Glue ETL job, I seem to get a failure message saying No such file or directory: 'config.json'
.
What am I doing wrong here? How can I make my use case work with Glue ETL?
These arguments can be passed as job parameters. On the console, this is found under section Security configuration, script libraries, and job parameters (optional) when creating or editing a job.
As per this answer, if you are using Referenced files path variable in a Python shell job, referenced file is found in /tmp
, where Python shell job has no access by default.