I am playing around with Stackdriver Monitoring alerting and having used Prometheus/Alertmanager a bit I am quite disappointed with the seeming lack of options.
For instance, I have a resource that emits one datapoint per day, an Epoch Second of the age of a certain resource. I would like to create an alert that compares the datapoint with the current time and if the resource is too old an alert should fire.
In Prometheus it would be expressed like this:
- alert: TooOldAlert
expr: sum(time() - datapoint_epoch_second) BY (datapoint_group) > 48 * 60 * 60
for: 1m
labels:
severity: critical
So if the age of the datapoint is more than 48 hours ago, I will be alerted.
There just doesn't seem to exist such an option in Stackdriver Monitoring alerting. I tried to check the API / programmatic interface as well but I came up short there as well.
TL;DR: Do built-in functions exist at all in Stackdriver Monitoring alerting?
Stackdriver Alerting does have built-in functions. But they have to do with aggregating, filtering, comparing, and creating ratios. See docs for alerting policies here. An example for setting them up is here.
However, there is no time()
function that you can use to get epoch time in these expressions. This is the rub.
In order to port your age alert into Stackdriver, one approach is to change the log "up-stream", so that your service emits a log for how old it is relative to "now". In this case, you can filter on the age without needing to invoke what time it is now.
If you cannot change the log structure in your service, you could choose to capture the log "down-stream" and do a transformation on it. One approach is to sink this Stackdriver log to Pub/Sub, and have that event trigger a Cloud Function. An example guide is here.