I have a glue job that is not working because the dynamic frame is not populating from a parquet in s3.
I have pointed it directly to an object that has data in it, but the dynamic frame is still blank.
Example below
input_dyf = glueContext.create_dynamic_frame.from_options("s3", {
"paths": ['s3://dev/.test/load_year=2023/load_month=2/load_day=22/.test.parquet'],
"recurse": False,
"groupFiles": "inPartition",
},
format = "parquet",
transformation_ctx = "DataSource0"
)
I have similar glue jobs with all the same configurations (and bookmarks off), and this is the only one failing.
I've tested this on my end with a similar filename and path name. What I found was that the filename can't include a period (.) in it. The S3 path is fine to have a period in it, but the parquet file itself cannot. Working example:
input_dyf = glueContext.create_dynamic_frame.from_options("s3", {
"paths": ['s3://dev/.test/load_year=2023/load_month=2/load_day=22/test.parquet'],
"recurse": False,
"groupFiles": "inPartition",
},
format = "parquet",
transformation_ctx = "DataSource0"
)
Removing the . from test.parquet seemed to solve this issue. Please test on your end and let me know.