Search code examples
amazon-web-servicesaws-glueaws-glue-data-catalog

Glue crawler to read pattern matched s3 files


While specifying s3 path in AWS Glue Crawler, can we mention some patterns to make the crawler read the files only with specific names in s3 folder instead of reading every file in the path?

Something like s3://sample_folder/sample_file%pattern%.csv.


Solution

  • Unfortunately, Glue doesn't support regex for inclusion filters. You can specify a folder path and set exclusion rules instead. For example, the path is s3://sample_folder and exclusion pattern *.{txt,avro} to filter out all txt and avro files.

    See Include and Exclude Patterns for more details.