kubernetes monitoring prometheus grafana-loki

Monitoring kubernetes pod health events

Currently I have setup kubernetes, with kube-state-metrics, prometheus and loki. For most things this work really well, but the one thing I am struggling with is finding the exact reason why containers might've restarted.

For running pods, it's quite easy to see, i.e. with kubectl describe pod, for example, I get the events below:

kubectl describe pod pod-name
.....
Events:
  Type     Reason     Age                  From     Message
  ----     ------     ----                 ----     -------
  Warning  Unhealthy  19m (x9 over 3h29m)  kubelet  message-here

This is very useful for troubleshooting exactly why a container have a lot of restarts, especially if using probes. But there are also other useful events.

However I cannot see any way to save these kind of events, in either loki or prometheus. But maybe I am missing something. I had expected kube-state-metrics to include such information but it seems not to be the case, I also do not see it anywhere in Loki.

Any tips on how I can save such events?

Solution

Whilst Prometheus and KSM are more focussed around metrics, eg: 'N number of pod restarts', Loki can be used for capturing events.

A good overview guide is here which uses eventrouter to push events into a backend (Loki or Elasticsearch for example)