Currently I have setup kubernetes, with kube-state-metrics, prometheus and loki. For most things this work really well, but the one thing I am struggling with is finding the exact reason why containers might've restarted.
For running pods, it's quite easy to see, i.e. with kubectl describe pod
, for example, I get the events below:
kubectl describe pod pod-name
.....
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 19m (x9 over 3h29m) kubelet message-here
This is very useful for troubleshooting exactly why a container have a lot of restarts, especially if using probes. But there are also other useful events.
However I cannot see any way to save these kind of events, in either loki or prometheus. But maybe I am missing something. I had expected kube-state-metrics
to include such information but it seems not to be the case, I also do not see it anywhere in Loki.
Any tips on how I can save such events?
Whilst Prometheus and KSM are more focussed around metrics, eg: 'N number of pod restarts', Loki can be used for capturing events.
A good overview guide is here which uses eventrouter to push events into a backend (Loki or Elasticsearch for example)