Search code examples
amazon-web-servicesamazon-s3aws-glue

Crawl is not running in S3 event mode


When running an AWS Glue crawler that points to S3, the second log entry in CloudWatch is always:

Crawl is not running in S3 event mode

What is S3 event mode?

The name sounds like some way of getting S3 to invoke Glue for partial crawls after every object upload to the prefix. But as far as I can tell, such functionality does not exist. So what is this log entry referring to?

The closest thing I found in the Glue documentation was event based triggers for Glue jobs, but Glue Jobs are different to Glue Crawlers.

Steps to reproduce

  1. Create a Glue Crawler. Choose any configuration. Point it to anywhere in any S3 bucket with any dataset (even an empty one)
  2. Run the crawler. It doesn't matter if the crawl fails or succeeds
  3. Open the logs for that crawl
  4. Look at the second log entry
2021-07-01T20:04:39.882+10:00
[6588c8ba-57e2-46e3-94b4-1bc4dfc5957d] BENCHMARK : Running Start Crawl for Crawler my-crawler
2021-07-01T20:04:40.200+10:00
[6588c8ba-57e2-46e3-94b4-1bc4dfc5957d] INFO : Crawl is not running in S3 event mode

Solution

  • AWS Support gave me an answer.

    S3 Event mode is functionality available internally inside AWS. As I suspected it means S3 triggers crawler crawls for every file upload. But this functionality is not public at the moment.