Search code examples
kubernetesprometheuskubernetes-helmprometheus-operator

Syncing Prometheus and Kubewatch with Kubernetes Cluster


I want to get all events that occurred in Kubernetes cluster in some python dictionary using maybe some API to extract data from the events that occurred in the past. I found on internet that it is possible by storing all data of Kube-watch on Prometheus and later accessing it. I am unable to figure out how to set it up and see all past pod events in python. Any alternative solutions to access past events are also appreciated. Thanks!


Solution

  • I'll describe a solution that is not complicated and I think meets all your requirements. There are tools such as Eventrouter that take Kubernetes events and push them to a user specified sink. However, as you mentioned, you only need Pods events, so I suggest a slightly different approach.

    In short, you can run the kubectl get events --watch command from within a Pod and collect the output from that command using a log aggregation system like Loki.

    Below, I will provide a detailed step-by-step explanation.

    1. Running kubectl command from within a Pod

    To display only Pod events, you can use:

    $ kubectl get events --watch --field-selector involvedObject.kind=Pod
    

    We want to run this command from within a Pod. For security reasons, I've created a separate events-collector ServiceAccount with the view Role assigned and our Pod will run under this ServiceAccount.
    NOTE: I've created a Deployment instead of a single Pod.

    $ cat all-in-one.yml
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: events-collector
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      name: events-collector-binding
    subjects:
      - kind: ServiceAccount
        name: events-collector
        namespace: default
    roleRef:
      kind: ClusterRole
      name: view
      apiGroup: rbac.authorization.k8s.io
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      labels:
        app: events-collector
      name: events-collector
    spec:
      selector:
        matchLabels:
          app: events-collector
      template:
        metadata:
          labels:
            app: events-collector
        spec:
          serviceAccountName: events-collector
          containers:
          - image: bitnami/kubectl
            name: test
            command: ["kubectl"]
            args: ["get","events", "--watch", "--field-selector", "involvedObject.kind=Pod"]
    

    After applying the above manifest, the event-collector was created and collects Pod events as expected:

    $ kubectl apply -f all-in-one.yml
    serviceaccount/events-collector created
    clusterrolebinding.rbac.authorization.k8s.io/events-collector-binding created
    deployment.apps/events-collector created
    
    $ kubectl get deploy,pod | grep events-collector
    deployment.apps/events-collector           1/1     1            1           14s
    pod/events-collector-d98d6c5c-xrltj            1/1     Running   0          14s
    
    $ kubectl logs -f events-collector-d98d6c5c-xrltj
    LAST SEEN   TYPE     REASON      OBJECT                                         MESSAGE
    77s         Normal   Scheduled   pod/app-1-5d9ccdb595-m9d5n                     Successfully assigned default/app-1-5d9ccdb595-m9d5n to gke-cluster-2-default-pool-8505743b-brmx
    76s         Normal   Pulling     pod/app-1-5d9ccdb595-m9d5n                     Pulling image "nginx"
    71s         Normal   Pulled      pod/app-1-5d9ccdb595-m9d5n                     Successfully pulled image "nginx" in 4.727842954s
    70s         Normal   Created     pod/app-1-5d9ccdb595-m9d5n                     Created container nginx
    70s         Normal   Started     pod/app-1-5d9ccdb595-m9d5n                     Started container nginx
    73s         Normal   Scheduled   pod/app-2-7747dcb588-h8j4q                     Successfully assigned default/app-2-7747dcb588-h8j4q to gke-cluster-2-default-pool-8505743b-p7qt
    72s         Normal   Pulling     pod/app-2-7747dcb588-h8j4q                     Pulling image "nginx"
    67s         Normal   Pulled      pod/app-2-7747dcb588-h8j4q                     Successfully pulled image "nginx" in 4.476795932s
    66s         Normal   Created     pod/app-2-7747dcb588-h8j4q                     Created container nginx
    66s         Normal   Started     pod/app-2-7747dcb588-h8j4q                     Started container nginx
    

    2. Installing Loki

    You can install Loki to store logs and process queries. Loki is like Prometheus, but for logs :). The easiest way to install Loki is to use the grafana/loki-stack Helm chart:

    $ helm repo add grafana https://grafana.github.io/helm-charts
    "grafana" has been added to your repositories
    
    $ helm repo update
    ...
    Update Complete. ⎈Happy Helming!⎈
    
    $ helm upgrade --install loki grafana/loki-stack
    
    
    $ kubectl get pods | grep loki
    loki-0                            1/1     Running   0          76s
    loki-promtail-hm8kn               1/1     Running   0          76s
    loki-promtail-nkv4p               1/1     Running   0          76s
    loki-promtail-qfrcr               1/1     Running   0          76s
    

    3. Querying Loki with LogCLI

    You can use the LogCLI tool to run LogQL queries against a Loki server. Detailed information on installing and using this tool can be found in the LogCLI documentation. I'll demonstrate how to install it on Linux:

    $ wget https://github.com/grafana/loki/releases/download/v2.2.1/logcli-linux-amd64.zip
    
    $ unzip logcli-linux-amd64.zip
    Archive:  logcli-linux-amd64.zip
      inflating: logcli-linux-amd64
      
    $ mv logcli-linux-amd64 logcli
    
    $ sudo cp logcli /bin/
    
    $ whereis logcli
    logcli: /bin/logcli
    

    To query the Loki server from outside the Kubernetes cluster, you may need to expose it using the Ingress resource:

    $ cat ingress.yml
    apiVersion: networking.k8s.io/v1beta1
    kind: Ingress
    metadata:
      annotations:
        kubernetes.io/ingress.class: nginx
        nginx.ingress.kubernetes.io/rewrite-target: /
      name: loki-ingress
    spec:
      rules:
        - http:
            paths:
              - backend:
                  serviceName: loki
                  servicePort: 3100
                path: /
                
     
    $ kubectl apply -f ingress.yml
    ingress.networking.k8s.io/loki-ingress created
    
    $ kubectl get ing
    NAME           CLASS    HOSTS   ADDRESS         PORTS   AGE
    loki-ingress   <none>   *      <PUBLIC_IP>      80      19s
    

    Finally, I've created a simple python script that we can use to query the Loki server:
    NOTE: We need to set the LOKI_ADDR environment variable as described in the documentation. You need to replace the <PUBLIC_IP> with your Ingress IP.

    $ cat query_loki.py
    #!/usr/bin/env python3
    
    import os
    
    os.environ['LOKI_ADDR'] = "http://<PUBLIC_IP>"
    
    os.system("logcli query '{app=\"events-collector\"}'")
    
    
    $ ./query_loki.py
    ...
    2021-07-02T10:33:01Z {} 2021-07-02T10:33:01.626763464Z stdout F 0s          Normal    Pulling       pod/backend-app-5d99cf4b-c9km4                               Pulling image "nginx"
    2021-07-02T10:33:00Z {} 2021-07-02T10:33:00.836755152Z stdout F 0s          Normal    Scheduled     pod/backend-app-5d99cf4b-c9km4                               Successfully assigned default/backend-app-5d99cf4b-c9km4 to gke-cluster-1-default-pool-328bd2b1-288w
    2021-07-02T10:33:00Z {} 2021-07-02T10:33:00.649954267Z stdout F 0s          Normal    Started       pod/web-app-6fcf9bb7b8-jbrr9                                 Started container nginx2021-07-02T10:33:00Z {} 2021-07-02T10:33:00.54819851Z stdout F 0s          Normal    Created       pod/web-app-6fcf9bb7b8-jbrr9                                 Created container nginx
    2021-07-02T10:32:59Z {} 2021-07-02T10:32:59.414571562Z stdout F 0s          Normal    Pulled        pod/web-app-6fcf9bb7b8-jbrr9                                 Successfully pulled image "nginx" in 4.228468876s
    ...