Search code examples
amazon-web-servicesaws-glue

AWS ETL Job Found duplicate column(s) in the data schema and the partition schema: `day`, `hour`, `month`, `year`


S3_node1653573520077 = glueContext.create_dynamic_frame.from_catalog(
    database="database",
    push_down_predicate="(year == 2021)",
    table_name="table",
    transformation_ctx="S3_node1653573520077",
)

For the AWS Glue ETL job, my purpose is to convert the data of CataLog into RDS through SQL, but I seem to be stuck at the beginning. That is like read the data of CataLog into this "DataFrame", the data source of this table is stored in S3, partition by the year, month and day hours.

When I start run the job, it occurs the error

Found duplicate column(s) in the data schema and the partition schema: day, hour, month, year

I don't quite understand why this error occurs.

Has anyone encountered a similar situation?


Solution

  • I removed the partition_key and it worked for me. Check if the partition column year exists.