Search code examples
prometheusgrafanaamazon-eksprometheus-operatorprometheus-node-exporter

Master Prometheus is not able to scrape the container metrics from EKS cluster in AWS


I have an AWS account with two EKS clusters, say EKS_A and EKS_B. EKS_A is in us-east-1 and EKS_B is in us-west-1 of same AWS account. On these AWS EKS clusters, I have the Prometheus namespace which is running the below pods -

pod/kube-state-metrics
pod/prometheus-alertmanager
pod/prometheus-node-exporter
pod/prometheus-pushgateway
pod/prometheus-server

daemonset.apps/prometheus-node-exporter 

deployment.apps/kube-state-metrics
deployment.apps/prometheus-pushgateway 

Now these EKS clusters each are exposing the metrics using their respective endpoints, and these two endpoints are consumed/used by the master Prometheus(which has web UI to show the metrics) which is setup in a different Kubernetes cluster that is not part of AWS.

Now the problem is - the master Prometheus is able to show or scrape all the metrics exposed by the EKS_A cluster in us-east-1, but it is not able to show the container related metrics from the EKS_B cluster in us-west-1.

This means the below container metrics are available in master Prometheus for the EKS_A cluster, but they are not showing for the EKS_B cluster -

container_cpu_cfs_periods_total
container_cpu_cfs_throttled_periods_total
container_cpu_cfs_throttled_seconds_total
container_cpu_load_average_10s
container_cpu_system_seconds_total
container_cpu_usage_seconds_total
container_cpu_user_seconds_total
container_file_descriptors
container_fs_inodes_free
container_fs_inodes_total
container_fs_io_current
container_fs_io_time_seconds_total
container_fs_io_time_weighted_seconds_total
container_fs_limit_bytes
container_fs_read_seconds_total

Please note that Prometheus master UI is able to show all the metrics from the EKS_B cluster, except the above container_* related metrics.

Any idea on why this could be happening and how I need to resolve it?

Thank you


Solution

  • cAdvisor is used to monitor resource usage and analyzes the performance of containers. In the Prometheus config file instead of using the name as cadvisor, I used Kubernetes-cadvisor which caused this issue. After changing Kubernetes-cadvisor to cadvisor the issue got resolved.