Search code examples
kubernetesprometheuskubernetes-pod

How to find metrics about CPU/MEM for the pod running on a Kubernetes cluster on Prometheus


I have Prometheus setup via Helm from Terraform and it's is configured to connect to my Kubernetes cluster. I open my Prometheus but I am not sure which metric to choose from the list to be able to view the CPU/MEM of running pods/jobs. Here are all the pods running with the command (test1 is the kube namespace):

kubectl -n test1 get pods

podsrunning

When, I am on Prometheus, I see many metrics related to CPU, but not sure which one to choose:

prom1

I tried to choose one, but the namespace = prometheus and it uses prometheus-node-exporter and I don't see my cluster or my namespace test1 anywhere here.

prom2

Could you please help me? Thank you very much in advance.

UPDATE SCREENSHOT UPDATE SCREENSHOT I need to concentrate on this specific namespace, normally with the command: kubectl get pods --all-namespaces | grep hermatwin I see the first line with namespace = jobs I think this is namespace. promQL1

No result when set calendar to last Friday: promQL2

UPDATE SCREENSHOT April 20 I tried to select 2 days with starting date on last Saturday 17 April but I don't see any result: noResult1

ANd, if I remove (namespace="jobs") condition, I don't see any result either: noresult2

I tried to rerun the job (simulation jobs) again just now and tried to execute the prometheus query while the job was still running mode but I don't get any result :-( Here you can see my jobs where running.

jobsRunning

I don't get any result: noresult3

When using simple filter, just container_cpu_usage_seconds_total, I can see the namespace="jobs" resultnamespacejobs

iRate1

ResultJob


Solution

  • node_cpu_seconds_total is a metric from node-exporter, the exporter that brings machine statistics and its metrics are prefixed with node_. You need metrics from cAdvisor, this one produces metrics related to containers and they are prefixed with container_:

    container_cpu_usage_seconds_total
    container_cpu_load_average_10s
    container_memory_usage_bytes
    container_memory_rss
    

    Here are some basic queries for you to get started. Be ready that they may require tweaking (you may have different label names):

    CPU Utilisation Per Pod

    sum(irate(container_cpu_usage_seconds_total{container!="POD", container=~".+"}[2m])) by (pod)
    

    RAM Usage Per Pod

    sum(container_memory_usage_bytes{container!="POD", container=~".+"}) by (pod)
    

    In/Out Traffic Rate Per Pod

    Beware that pods with host network mode (not isolated) show traffic rate for the whole node. * 8 is to convert bytes to bits for convenience (MBit/s, GBit/s, etc).

    # incoming
    sum(irate(container_network_receive_bytes_total[2m])) by (pod) * 8
    # outgoing
    sum(irate(container_network_transmit_bytes_total[2m])) by (pod) * 8