I would like to set my glue crawler to only crawl new folders in my s3 bucket. Based on documentation, it looks like I want to set the RecrawlBehavior to CRAWL_NEW_FOLDERS_ONLY. But I can't find any guidance on how to do that in a CloudFormation template.
This is my crawler's configuration property now, but my use of RecrawlBehavior is invalid:
Configuration: "{\"Version\":1.0,\"RecrawlBehavior\":\"CRAWL_NEW_FOLDERS_ONLY\",\"CrawlerOutput\":{\"Partitions\":{\"AddOrUpdateBehavior\":\"InheritFromTable\"},\"Tables\":{\"AddOrUpdateBehavior\":\"MergeNewColumns\"}}}"
As per my understanding, Incremental policy is a relatively new feature in Glue and not supported in Cloud Formation yet.
A workaround I can suggest to overcome this limitation is creating a crawler using cloudformation and then use AWS CLI to update its RecrawlPolicy property.
When you create a crawler using cloudformation and try to retrieve its properties using CLI, RecrawlPolicy" has "RecrawlBehavior" set to "CRAWL_EVERYTHING". You can use the below command to change it to incremental crawls (Crawl new folders only).
aws glue update-crawler
--name <crawlername>
--recrawl-policy '{"RecrawlBehavior": "CRAWL_NEW_FOLDERS_ONLY"}'
--schema-change-policy '{"UpdateBehavior":"LOG","DeleteBehavior":"LOG"}'