I would like to create an idle or uptime metric for all dataproc clusters I am running, and from what I've seen in stackdriver, I was not able to do so. My scenario is that I have timed dataproc jobs that run daily and after the jobs are done I delete the cluster. I would like to create an alert via email if there are idle dataproc clusters that do nothing for an hour or a dataproc cluster uptime is more then 24 (or even 20) hours.
Thanks.
There's a 3 items in your question I'd like to address separately:
About alerting on idle metric: Dataproc does not expose such a metric and I will file a feature request for us to add one. In the mean time, you can approximate idleness by detecting when this metric: dataproc.googleapis.com/cluster/yarn/containers
goes down to 0 for an hour or so.
Regarding cluster being idle for an hour or alive for 24 hours. This can be automated through Dataproc via Scheduled Deletion feature: gcloud beta dataproc clusters create ... --max-age=24h --max-idle=1h
About daily jobs. I think here you could sidestep questions #1 and #2 entirely and leverage Workflow Templates to manage cluster creation, teardown, and job execution. If your automation is through Api clients or you need to pass different parameters on each invocation InstantiateInline method will do the trick