Search code examples
amazon-web-servicesterraformaws-glueterraform-provider-aws

set a schedule for aws glue crawler


What's the correct way to add a daily schedule to AWS Glue Crawler via Terraform? I tried to follow the official docs and tried this:

resource "aws_glue_crawler" "test" {
  name              = "test"
  description       = "Ctest"
  role              = data.aws_iam_role.test.arn
  database_name     = "test"
  schedule          = "cron(0 23 * * *)" 

  schema_change_policy {
    delete_behavior = "LOG"
    update_behavior = "LOG"
  }

  s3_target {
    path        = "s3://test//"
    sample_size = 10
  }

  recrawl_policy {
    recrawl_behavior = "CRAWL_NEW_FOLDERS_ONLY"
  }

  configuration = jsonencode(
    {
      CreatePartitionIndex = true
      Version = 1
    }
  )
}

Why do I get this error while applying the terraform changes?

Message_: "Invalid schedule cron expression: cron(0 23 * * *)"

I tested the cron expression via online tools and it seems to be valid.


Solution

  • If you really do take a closer look at the docs, you will see that the cron expression is slightly different to what you have in your configuration at the moment. For example, this should work:

    resource "aws_glue_crawler" "test" {
      name              = "test"
      description       = "Ctest"
      role              = data.aws_iam_role.test.arn
      database_name     = "test"
      schedule          = "cron(0 23 * * ? *)" 
    
      schema_change_policy {
        delete_behavior = "LOG"
        update_behavior = "LOG"
      }
    
      s3_target {
        path        = "s3://test//"
        sample_size = 10
      }
    
      recrawl_policy {
        recrawl_behavior = "CRAWL_NEW_FOLDERS_ONLY"
      }
    
      configuration = jsonencode(
        {
          CreatePartitionIndex = true
          Version = 1
        }
      )
    }
    

    In this case, the crawler would run every day at 11pm. From the AWS docs for the cron expressions:

    cron(Minutes Hours Day-of-month Month Day-of-week Year)

    If you don't want to run the job every day of the week, you can take a look at the Examples section in the AWS docs.