google-cloud-platform stackdriver google-cloud-stackdriver google-cloud-monitoring

How to set a GCP Cloud Monitoring (Stackdriver) alert policy period greater than 24 hours?

Currently 24 hours is the limit of time a Cloud Monitoring (erstwhile Stackdriver) alert policy can be set.

However, if you have a daily activity, like a database backup, it might take a little more or less time each day (e.g. run in 1 hour 10min one day, 1 hour 12min the next day). In this case, you might not see your completion indicator until 24 hours and 2 minutes since the prior indicator. This will cause Cloud Monitoring to issue an alert (because you are +2min over the alerting window limit).

Is there a way to better handle the variance in these alerts, like a 25 hour look back period?

Solution

I found a work around to this problem.

Create a metric for when your job starts (e.g. started_metric)
Create a metric for when your job finishes (e.g. completed_metric)

Now create a two part Alert Policy

Require that started_metric occurs once per 24 hours
Require that completed_metric occurs once per 24 hours
Trigger if (1) and (2) above are met (e.g. both > 24 hours)

This works around the 24 hour job jitter issue, as the job might take > 24 hours to complete, but it should always start (e.g. cron job) within 24 hours.