Search code examples
aws-glue

Debugging Glue Crawler EOFException


I am using AWS Glue for the first time to crawl a large json file in a S3 bucket to create a new table schema. I created a new crawler and manually ran it. The crawler job runs without error, but when I check the logs, I get the following EOF Exception notification below.

ERROR : Error java.io.EOFException retrieving file at s3://insurance-transparency-data/2022-09-05_796b7d27-c275-4e37-b4c8-be2e4c0c6eda_Aetna-Life-Insurance-Company.json.gz. Tables created did not infer schemas from this file.

I tried uploading a simple test json file to the same S3 bucket and ran the crawler against it, and it parsed the schema perfectly. So, I don't think it is a problem with the permissions or crawler config.

Any suggestions on how to debug further?


Solution

  • It turns out the EOFException had something to do with the file being gzipped. Saving the uncompressed file to S3 and running the crawler against it worked fine.