Search code examples
google-cloud-platformstackdrivergoogle-cloud-stackdrivergoogle-cloud-monitoring

How to set a GCP Cloud Monitoring (Stackdriver) alert policy period greater than 24 hours?


Currently 24 hours is the limit of time a Cloud Monitoring (erstwhile Stackdriver) alert policy can be set.

However, if you have a daily activity, like a database backup, it might take a little more or less time each day (e.g. run in 1 hour 10min one day, 1 hour 12min the next day). In this case, you might not see your completion indicator until 24 hours and 2 minutes since the prior indicator. This will cause Cloud Monitoring to issue an alert (because you are +2min over the alerting window limit).

Is there a way to better handle the variance in these alerts, like a 25 hour look back period?


Solution

  • I found a work around to this problem.

    1. Create a metric for when your job starts (e.g. started_metric)
    2. Create a metric for when your job finishes (e.g. completed_metric)

    Now create a two part Alert Policy

    1. Require that started_metric occurs once per 24 hours
    2. Require that completed_metric occurs once per 24 hours
    3. Trigger if (1) and (2) above are met (e.g. both > 24 hours)

    This works around the 24 hour job jitter issue, as the job might take > 24 hours to complete, but it should always start (e.g. cron job) within 24 hours.