I have a local AWS Glue environment with the AWS Glue libraries, Spark, PySpark, and everything installed.
I'm running the following code (literally copy-past in the REPL):
from awsglue.utils import getResolvedOptions
args = []
args.insert(-1, {"--JOB_NAME": "JOB_NAME"})
args.insert(-1, {"--input_file_path": "s3://things/that.csv"})
args.insert(-1, {"--output_bucket": "s3://things"})
getResolvedOptions(args, [
'--JOB_NAME',
'--input_file_path',
'--output_bucket']
)
I get the following error:
Traceback (most recent call last):
File "<stdin>", line 4, in <module>
File "C:\Users\UBI9\bin\aws-glue-libs\PyGlue.zip\awsglue\utils.py", line 115, in getResolvedOptions
File "C:\Progra~1\Python37\lib\argparse.py", line 1781, in parse_known_args
namespace, args = self._parse_known_args(args, namespace)
File "C:\Progra~1\Python37\lib\argparse.py", line 1822, in _parse_known_args
option_tuple = self._parse_optional(arg_string)
File "C:\Progra~1\Python37\lib\argparse.py", line 2108, in _parse_optional
if not arg_string[0] in self.prefix_chars:
KeyError: 0
The value of args
is as follows:
[{'--input_file_path': 's3://things/that.csv'}, {'--output_bucket': 's3://things'}, {'--JOB_NAME': 'JOB_NAME'}]
When I pull up the docs it looks like args
is a list of arguments. I'd assumed it was a list of key-value pairs. Is that wrong? Can I not run this function locally?
From AWS documentation, --JOB_NAME
is internal to AWS Glue and you should not set it.
If you're running a local Glue setup and wish to run the job locally, you can pass the --JOB_NAME
parameter when the job is submitted to gluesparksubmit. E.g.
./bin/gluesparksubmit path/to/job.py --JOB_NAME=my-job --input_file_path='s3://path'
And access the arguments
args = getResolvedOptions(sys.argv, ['JOB_NAME', 'input_file_path'])
print(args['JOB_NAME'])
print(args['input_file_path'])