The ingress-nginx
pod I have helm-installed into my EKS cluster is perpetually failing, its logs indicating the application cannot bind to 0.0.0.0:8443
(INADDR_ANY:8443
). I have confirmed that 0.0.0.0:8443
is indeed already bound in the container, but bc I don't yet have root access to the container I've been unable to glean the culprit process/user.
I have created this issue on the kubernetes ingress-nginx project that I'm using, but also wanted to reach out to a wider SO community that might lend insights, solutions and troubleshooting suggestions for how to get past this hurdle.
Being a newcomer to both AWS/EKS and Kubernetes, it is likely that there is some environment configuration error causing this issue. For example, is it possible that this could be caused by a misconfigured AWS-ism such as the VPC (its Subnets or Security Groups)? Thank you in advance for your help!
The linked GitHub issue provides copious details about the Terraform-provisioned EKS environment as well as the Helm-installed deployment of ingress-nginx
. Here are some key details:
coredns
and ingress
. These are dedicated to kube-system/kube-dns and ingress-nginx, respectively. Other than the selectors' namespaces and labels, there is nothing "custom" about the profile specification. It has been confirmed that the selectors are working, both for coredns and ingress. I.e. the ingress pods are scheduled to run, but failing.ingress-nginx
is using port 8443 is that I first ran into this Privilege Escalation issue whose workaround requires one to disable allowPrivilegeEscalation
and change ports from privileged to unprivileged ones. I'm invoking helm install
with the following values:controller:
extraArgs:
http-port: 8080
https-port: 8443
containerPort:
http: 8080
https: 8443
service:
ports:
http: 80
https: 443
targetPorts:
http: 8080
https: 8443
image:
allowPrivilegeEscalation: false
# https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes
livenessProbe:
initialDelaySeconds: 60 # 30
readinessProbe:
initialDelaySeconds: 60 # 0
service:
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
initialDelaySeconds
in the values passed to the helm install. But eventually I looked at the pod/container logs and found that regardless of the *ness probe settings, every time I reinstall the ingress-nginx
pod and wait a bit, the logs will indicate the same bind error reported here:2021/11/12 17:15:02 [emerg] 27#27: bind() to [::]:8443 failed (98: Address in use)
.
.```
6. Aside from what I've noted above, I haven't intentionally configured anything "non-stock". I'm a bit lost in AWS/K8s's sea of configuration looking for what piece I need to adapt/correct.
Do you have clues or guesses why INADDR_ANY, port 8443 would already be bound in my (fairly-standard) `nginx-ingress-ingress-nginx-controller` pod/container?
As I aluded earlier, I am able to execute `netstat` command inside the running container as default user `www-data` to confirm indeed 0:8443 is already bound, but because I haven't yet figured out how to get root access, the PID/name of the processes are not available to me:
```> kubectl exec -n ingress --stdin --tty nginx-ingress-ingress-nginx-controller-74d46b8fd8-85tkh -- netstat -tulpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.1:10245 0.0.0.0:* LISTEN -
tcp 3 0 127.0.0.1:10246 0.0.0.0:* LISTEN -
tcp 0 0 127.0.0.1:10247 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:8080 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:8080 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:8181 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:8181 0.0.0.0:* LISTEN -
tcp 0 0 :::8443 :::* LISTEN -
tcp 0 0 :::10254 :::* LISTEN -
tcp 0 0 :::8080 :::* LISTEN -
tcp 0 0 :::8080 :::* LISTEN -
tcp 0 0 :::8181 :::* LISTEN -
tcp 0 0 :::8181 :::* LISTEN -```
```> kubectl exec -n ingress --stdin --tty nginx-ingress-ingress-nginx-controller-74d46b8fd8-85tkh -- /bin/bash
bash-5.1$ whoami
www-data
bash-5.1$ ps aux
PID USER TIME COMMAND
1 www-data 0:00 /usr/bin/dumb-init -- /nginx-ingress-controller --publish-service=ingress/nginx-ingress-ingress-nginx-controller --election-id=ingress-controller-leader --controller-class=k8s.io/ingress-nginx
8 www-data 0:00 /nginx-ingress-controller --publish-service=ingress/nginx-ingress-ingress-nginx-controller --election-id=ingress-controller-leader --controller-class=k8s.io/ingress-nginx --configmap=ingress/n
28 www-data 0:00 nginx: master process /usr/local/nginx/sbin/nginx -c /etc/nginx/nginx.conf
30 www-data 0:00 nginx: worker process
45 www-data 0:00 /bin/bash
56 www-data 0:00 ps aux```
I'm currently looking into how to get root access to my Fargate containers (without mucking about with their Dockerfiles to install ssh..) so I can get more insight into what process/user is binding INADDR_ANY:8443 in this pod/container.
Posted community wiki answer based on the same topic and this similar issue (both on GitHub page). Feel free to expand it.
The problem is that 8443 is already bound for the webhook. That's why I used 8081 in my suggestion, not 8443. The examples using 8443 here had to also move the webhook, which introduces more complexity to the changes, and can lead to weird issues if you get it wrong.
An example with used 8081 port:
As well as those settings, you'll also need to use the appropriate annotations to run using NLB rather than ELB, so all-up it ends up looking something like
controller: extraArgs: http-port: 8080 https-port: 8081 containerPort: http: 8080 https: 8081 image: allowPrivilegeEscalation: false service: annotations: service.beta.kubernetes.io/aws-load-balancer-type: "nlb-ip"
Edit: Fixed the aws-load-balancer-type to be
nlb-ip
, as that's required for Fargate. It probably should beservice.beta.kubernetes.io/aws-load-balancer-type: "external" service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "ip"
for current versions of the AWS Load Balancer controller (2.2 onwards), but new versions will recognise the
nlb-ip
annotation