S3_node1653573520077 = glueContext.create_dynamic_frame.from_catalog(
database="database",
push_down_predicate="(year == 2021)",
table_name="table",
transformation_ctx="S3_node1653573520077",
)
For the AWS Glue ETL job, my purpose is to convert the data of CataLog into RDS through SQL, but I seem to be stuck at the beginning. That is like read the data of CataLog into this "DataFrame", the data source of this table is stored in S3, partition by the year, month and day hours.
When I start run the job, it occurs the error
Found duplicate column(s) in the data schema and the partition schema: day
, hour
, month
, year
I don't quite understand why this error occurs.
Has anyone encountered a similar situation?
I removed the partition_key
and it worked for me. Check if the partition column year
exists.