What's the correct way to add a daily schedule to AWS Glue Crawler via Terraform? I tried to follow the official docs and tried this:
resource "aws_glue_crawler" "test" {
name = "test"
description = "Ctest"
role = data.aws_iam_role.test.arn
database_name = "test"
schedule = "cron(0 23 * * *)"
schema_change_policy {
delete_behavior = "LOG"
update_behavior = "LOG"
}
s3_target {
path = "s3://test//"
sample_size = 10
}
recrawl_policy {
recrawl_behavior = "CRAWL_NEW_FOLDERS_ONLY"
}
configuration = jsonencode(
{
CreatePartitionIndex = true
Version = 1
}
)
}
Why do I get this error while applying the terraform changes?
Message_: "Invalid schedule cron expression: cron(0 23 * * *)"
I tested the cron expression via online tools and it seems to be valid.
If you really do take a closer look at the docs, you will see that the cron expression is slightly different to what you have in your configuration at the moment. For example, this should work:
resource "aws_glue_crawler" "test" {
name = "test"
description = "Ctest"
role = data.aws_iam_role.test.arn
database_name = "test"
schedule = "cron(0 23 * * ? *)"
schema_change_policy {
delete_behavior = "LOG"
update_behavior = "LOG"
}
s3_target {
path = "s3://test//"
sample_size = 10
}
recrawl_policy {
recrawl_behavior = "CRAWL_NEW_FOLDERS_ONLY"
}
configuration = jsonencode(
{
CreatePartitionIndex = true
Version = 1
}
)
}
In this case, the crawler would run every day at 11pm. From the AWS docs for the cron expressions:
cron(Minutes Hours Day-of-month Month Day-of-week Year)
If you don't want to run the job every day of the week, you can take a look at the Examples section in the AWS docs.