Search code examples
amazon-web-servicesboto3aws-glue

Crawler is creating a table with weird suffix to the name


We have an ETL script which reads the data form catalogue and writes in s3 as parquet. We're also calling a crawler to create/update the tables in Athena. However, it is creating table but adding some weird suffix to the table name.

All the files in the folder that I'm crawling are in parquet with the same schema. Also this is happening only when we're calling the crawler from the ETL script.

The script we used to call the crawler

glue_client = boto3.client("glue", region_name=args.get("aws_region"))
glue_client.start_crawler(Name=args["crawler_name"])

Expected: table_name Actual: table_name_31e198c8c61861f127ae06487eb14a3f


Solution

  • This happens when ever Glue crawler encounters a duplicate table name in the Glue data catalogue. Refer to this doc which talks about this behaviour :

    If duplicate table names are encountered, the crawler adds a hash string suffix to the name.