Search code examples
amazon-web-servicesamazon-s3aws-glue

AWS Glue not detecting header in CSV


Hi I have a bunch of CSV's located in S3, a crawler setup via AWS Glue, this crawler builds about 10 tables as it scan 10 folders and only 1 of them where the headers are not being detected. The structure of the csv is the same as all the others. Advice please?

enter image description here


Solution

  • You can create the table yourself and instead of crawling point to an s3 path, you can crawl based on an existing table. This is the concept used when a crawler is not detecting the schema especially just column headings.

    Crawl existing table

    Also check if the skip.header.line.count=1 is being added automatically, if not you can add manually and it an update the schema to the correct one you require. On your subsequent runs for your crawler, you can change the properties so that it will ignore schema updates and only perform partition updates to your table.