Search code examples
parquetaws-glue

Getting an "Internal Service Exception" when trying to run an extremely basic AWS-glue crawler with a large number of columns


I'm trying to do some POC-testing by getting S3 parquet files to be queryable through Athena.

I'm starting with something pretty basic: a single parquet file, with around 400 rows and about 800 columns (this is an unusual storage system I know; but for business logic reasons there aren't a ton of other options)

This seems to fail when I try to run a glue crawler across it, with a generic Internal Service Exception error.

I tried the same thing with a smaller number of columns (everything else the same) and low and behold, it worked. Is this some sort of limitation I'm unaware of?

Any help would be appreciated.


Solution

  • This is not glue limitation but athena limitation. Since data catalog is internally using Athena for queries, it should follow Athena standards.

    Athena table, view, database, and column names allow only underscore special characters

    Athena table, view, database, and column names cannot contain special characters, other than underscore (_).

    More details: https://docs.aws.amazon.com/athena/latest/ug/tables-databases-columns-names.html