Our team has written a docker image. We want to receive an alert if the pod running this image fails. (For this we are using Prometheus's alert manager and kube-state-metrics).
A different team is creating the job that runs that image (note they're doing it via something akin to argo). In order to get the alert we want, we are asking that team to include a specific label i.e. the pod created by the job will have a label we can use in promql to create an alert when that pod fails.
The only way we can think of enforcing that the correct label is used is to check for this label from within the container, and fail with a error message telling us the label is missing. So either via a downward API (but that is another requirement for the team running the job), or more likely by just running kubectl get pods -l ...
since this container already uses kubectl
for something else.
There is a debate in our team if this is a bad practice. Is it an anti-pattern for a container to insist on a pod label? We are wondering if there is a cleaner design for a situation like this?
In my opinion the idiomatic way of enforcing certain fields exist in Kubernetes is by creating a dynamic mutating admission controller.
https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/ https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#mutatingadmissionwebhook
I know it could sound a bit complex, but trust me, it's really simple. Eventually, an admission control is simply a webhook endpoint (a piece of code) which can change and enforce a certain state on created objects.
BTW, you can also use a validating webhook and simply disallow creation of pods that does not contain the label you want, with a corresponding relevant error message.