for the purposes of troubleshooting I decided to deploy a very vanilla implementation of Prometheus NodeExporter via helm install exporter stable/prometheus
however I can't get the pods to start. I've searched high and low and I'm not sure where else to turn. I'm able to install many other apps on my cluster with the exception of just this one. I've attached some troubleshooting output for your reference. I believe it may have something to do with "tolerations" but I'm still digging in.
EKS cluster is running on 3 t2.large which can support up to 35 pods per node, and I'm running a total of 43 pods. Any other ideas for troubleshooting would be greatly appreciated.
Describe Pods Output
✗ kubectl get pods
NAME READY STATUS RESTARTS AGE
exporter-prometheus-node-exporter-bcwc4 0/1 Pending 0 15m
exporter-prometheus-node-exporter-kr7z7 0/1 Pending 0 15m
exporter-prometheus-node-exporter-lw87g 0/1 Pending 0 15m
Describe Pods
Name: exporter-prometheus-node-exporter-bcwc4
Namespace: monitoring
Priority: 0
Node: <none>
Labels: app=prometheus
chart=prometheus-11.1.2
component=node-exporter
controller-revision-hash=668b4894bb
heritage=Helm
pod-template-generation=1
release=exporter
Annotations: kubernetes.io/psp: eks.privileged
Status: Pending
IP:
IPs: <none>
Controlled By: DaemonSet/exporter-prometheus-node-exporter
Containers:
prometheus-node-exporter:
Image: prom/node-exporter:v0.18.1
Port: 9100/TCP
Host Port: 9100/TCP
Args:
--path.procfs=/host/proc
--path.sysfs=/host/sys
Environment: <none>
Mounts:
/host/proc from proc (ro)
/host/sys from sys (ro)
/var/run/secrets/kubernetes.io/serviceaccount from exporter-prometheus-node-exporter-token-rl4fm (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
proc:
Type: HostPath (bare host directory volume)
Path: /proc
HostPathType:
sys:
Type: HostPath (bare host directory volume)
Path: /sys
HostPathType:
exporter-prometheus-node-exporter-token-rl4fm:
Type: Secret (a volume populated by a Secret)
SecretName: exporter-prometheus-node-exporter-token-rl4fm
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/network-unavailable:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/pid-pressure:NoSchedule
node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unschedulable:NoSchedule
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 2s (x24 over 29m) default-scheduler 0/3 nodes are available: 2 node(s) didn't match node selector, 3 node(s) didn't
have free ports for the requested pod ports.
Daemoneset Config
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
creationTimestamp: "2020-05-12T06:15:30Z"
generation: 1
labels:
app: prometheus
chart: prometheus-11.1.2
component: node-exporter
heritage: Helm
release: exporter
name: exporter-prometheus-node-exporter
namespace: monitoring
resourceVersion: "8131959"
selfLink: /apis/extensions/v1beta1/namespaces/monitoring/daemonsets/exporter-prometheus-node-exporter
uid: 5ede0739-cd05-4e3b-ace1-87fafb33314a
spec:
revisionHistoryLimit: 10
selector:
matchLabels:
app: prometheus
component: node-exporter
release: exporter
template:
metadata:
creationTimestamp: null
labels:
app: prometheus
chart: prometheus-11.1.2
component: node-exporter
heritage: Helm
release: exporter
spec:
containers:
- args:
- --path.procfs=/host/proc
- --path.sysfs=/host/sys
image: prom/node-exporter:v0.18.1
imagePullPolicy: IfNotPresent
name: prometheus-node-exporter
ports:
- containerPort: 9100
hostPort: 9100
name: metrics
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /host/proc
name: proc
readOnly: true
- mountPath: /host/sys
name: sys
readOnly: true
dnsPolicy: ClusterFirst
hostNetwork: true
hostPID: true
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: exporter-prometheus-node-exporter
serviceAccountName: exporter-prometheus-node-exporter
terminationGracePeriodSeconds: 30
volumes:
- hostPath:
path: /proc
type: ""
name: proc
- hostPath:
path: /sys
type: ""
name: sys
templateGeneration: 1
updateStrategy:
rollingUpdate:
maxUnavailable: 1
type: RollingUpdate
status:
currentNumberScheduled: 3
desiredNumberScheduled: 3
numberMisscheduled: 0
numberReady: 0
numberUnavailable: 3
observedGeneration: 1
updatedNumberScheduled: 3
3 node(s) didn't have free ports for the requested pod ports.
From the error it shows that allocated node port is already in use. As you define hostPort: 9100
, it limits the number of places the pod can be scheduled, because each <hostIP, hostPort, protocol>
combination must be unique. Ref: https://kubernetes.io/docs/concepts/configuration/overview/#services