Search code examples
prometheusgrafanapromql

is there a way to get all the currently running k8s jobs duration using prometheus?


I'm trying to set up an alert in my Kubernetes cluster that notifies me if a Job is running for an extended period. However, I'm having trouble finding the appropriate metric that directly shows the current duration of the Job. I've been looking for something like kube_job_duration, but it doesn't seem to exist.

I originally tried this but it's only for finished jobs:

kube_job_status_completion_time - on(job_name) kube_job_status_start_time > 39600

Is there an alternative approach I can take to achieve this? Could I possibly leverage the time() function in PromQL to calculate the Job duration? Any guidance on how to effectively create this kind of alert would be greatly appreciated!


Solution

  • while typing this question i realized the answer but i was thinking i might as well share it anyway. the query i used is:

    time() - ((kube_job_status_active == 1) * on(job_name) kube_job_status_start_time) > 39600