I have created two partitions in a s3 bucket and loading a csv file in each of the folder. Accordingly running the Glue crawler on top of these files, which are registered as a table in Glue catalog,which Im able to query via Athena.
When I run the crawler first time on (1), it creates the Glue table/schema. Later when I upload the same data in different order to a different partition as (2) and run the crawler,it just tries to map the second file to the schema already created as part of (1), which results in data issues.
Does order of columns in Glue important? Does the crawler not automatically identify the columns based on the name, instead of the expecting in the same order (2) as of (1).
Order is important in csv files. Any change makes it think that the schema is different. However if u use parquet files, then order can be played around with