amazon-web-services terraform aws-glue terraform-provider-aws

set a schedule for aws glue crawler

What's the correct way to add a daily schedule to AWS Glue Crawler via Terraform? I tried to follow the official docs and tried this:

resource "aws_glue_crawler" "test" {
  name              = "test"
  description       = "Ctest"
  role              = data.aws_iam_role.test.arn
  database_name     = "test"
  schedule          = "cron(0 23 * * *)" 

  schema_change_policy {
    delete_behavior = "LOG"
    update_behavior = "LOG"
  }

  s3_target {
    path        = "s3://test//"
    sample_size = 10
  }

  recrawl_policy {
    recrawl_behavior = "CRAWL_NEW_FOLDERS_ONLY"
  }

  configuration = jsonencode(
    {
      CreatePartitionIndex = true
      Version = 1
    }
  )
}

Why do I get this error while applying the terraform changes?

Message_: "Invalid schedule cron expression: cron(0 23 * * *)"

I tested the cron expression via online tools and it seems to be valid.

Solution

If you really do take a closer look at the docs, you will see that the cron expression is slightly different to what you have in your configuration at the moment. For example, this should work:

resource "aws_glue_crawler" "test" {
  name              = "test"
  description       = "Ctest"
  role              = data.aws_iam_role.test.arn
  database_name     = "test"
  schedule          = "cron(0 23 * * ? *)" 

  schema_change_policy {
    delete_behavior = "LOG"
    update_behavior = "LOG"
  }

  s3_target {
    path        = "s3://test//"
    sample_size = 10
  }

  recrawl_policy {
    recrawl_behavior = "CRAWL_NEW_FOLDERS_ONLY"
  }

  configuration = jsonencode(
    {
      CreatePartitionIndex = true
      Version = 1
    }
  )
}

In this case, the crawler would run every day at 11pm. From the AWS docs for the cron expressions:

cron(Minutes Hours Day-of-month Month Day-of-week Year)

If you don't want to run the job every day of the week, you can take a look at the Examples section in the AWS docs.