Search code examples
aws-glueaws-glue-data-catalog

What happens when I run the Glue Crawler again without any change in the files in S3 path it pointed to?


I run a AWS Crawler once. Again I run the same Crawler. What is the difference first and second time?

When I the Run same AWS Crawler gain without any change in the files in S3, will it crawl all the files gain?

Sometime I may have 500 files in the bucket. Will Crawler process all the files or it does nothing as the files are not modified and no new files?


Solution

  • As per the documentation

    If your crawler runs more than once, perhaps on a schedule, it looks for new or changed files or tables in your data store. The output of the crawler includes new tables and partitions found since a previous run.

    I imagine that it has a mechanism to track S3 file changes using modification date of files.