Search code examples
nginxkuberneteskubernetes-ingressnginx-ingress

Nginx-ingress worker processes constantly restarting


I recently upgraded my ingress controller to kubernetes-ingress v1.10.0. The ingresses seem to route traffic correctly but after checking the pods logs, I noticed a huge amount of notice were generated:

2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2748
2021/02/10 09:40:23 [notice] 19#19: worker process 2748 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2745
2021/02/10 09:40:23 [notice] 19#19: worker process 2745 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
W0210 09:40:23.416499       1 listers.go:79] can not retrieve list of objects using index : Index with name namespace does not exist
W0210 09:40:23.416812       1 listers.go:79] can not retrieve list of objects using index : Index with name namespace does not exist
W0210 09:40:23.416912       1 listers.go:79] can not retrieve list of objects using index : Index with name namespace does not exist
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2735
2021/02/10 09:40:23 [notice] 19#19: worker process 2735 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2737
2021/02/10 09:40:23 [notice] 19#19: worker process 2737 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: worker process 2742 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2746
2021/02/10 09:40:23 [notice] 19#19: worker process 2746 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2744
2021/02/10 09:40:23 [notice] 19#19: worker process 2744 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2740
2021/02/10 09:40:23 [notice] 19#19: worker process 2740 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2736
2021/02/10 09:40:23 [notice] 19#19: worker process 2736 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2741
2021/02/10 09:40:23 [notice] 19#19: worker process 2734 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: worker process 2741 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2739
2021/02/10 09:40:23 [notice] 19#19: worker process 2739 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2738
2021/02/10 09:40:23 [notice] 19#19: worker process 2738 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2743
2021/02/10 09:40:23 [notice] 19#19: worker process 2743 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2749
2021/02/10 09:40:23 [notice] 19#19: worker process 2749 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2747
2021/02/10 09:40:23 [notice] 19#19: worker process 2747 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [warn] 2718#2718: *6697105 an upstream response is buffered to a temporary file /var/cache/nginx/proxy_temp/6/79/0000214796 while reading upstream, client: xxxx, server: xxxx, request: "GET /xxxx HTTP/1.1", upstream: "xxxx", host: "xxxx", referrer: "xxxx"
2021/02/10 09:40:23 [notice] 2769#2769: signal process started
2021/02/10 09:40:23 [notice] 19#19: signal 1 (SIGHUP) received from 2769, reconfiguring
2021/02/10 09:40:23 [notice] 19#19: reconfiguring
2021/02/10 09:40:23 [notice] 19#19: using the "epoll" event method
2021/02/10 09:40:23 [notice] 19#19: start worker processes
2021/02/10 09:40:23 [notice] 19#19: start worker process 2770
2021/02/10 09:40:23 [notice] 19#19: start worker process 2771
2021/02/10 09:40:23 [notice] 19#19: start worker process 2772
2021/02/10 09:40:23 [notice] 19#19: start worker process 2773
2021/02/10 09:40:23 [notice] 19#19: start worker process 2774
2021/02/10 09:40:23 [notice] 19#19: start worker process 2775
2021/02/10 09:40:23 [notice] 19#19: start worker process 2776
2021/02/10 09:40:23 [notice] 19#19: start worker process 2777
2021/02/10 09:40:23 [notice] 19#19: start worker process 2778
2021/02/10 09:40:23 [notice] 19#19: start worker process 2779
2021/02/10 09:40:23 [notice] 19#19: start worker process 2780
2021/02/10 09:40:23 [notice] 19#19: start worker process 2781
2021/02/10 09:40:23 [notice] 19#19: start worker process 2782
2021/02/10 09:40:23 [notice] 19#19: start worker process 2783
2021/02/10 09:40:23 [notice] 19#19: start worker process 2784
2021/02/10 09:40:23 [notice] 19#19: start worker process 2785
90.114.22.230 - - [10/Feb/2021:09:40:23 +0000] "GET /xxxx HTTP/1.1" 200 352910 "xxxx" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:84.0) Gecko/20100101 Firefox/84.0" "-"
2021/02/10 09:40:23 [notice] 2753#2753: gracefully shutting down
2021/02/10 09:40:23 [notice] 2755#2755: gracefully shutting down
2021/02/10 09:40:23 [notice] 2760#2760: gracefully shutting down
2021/02/10 09:40:23 [notice] 2755#2755: exiting
2021/02/10 09:40:23 [notice] 2753#2753: exiting
2021/02/10 09:40:23 [notice] 2762#2762: gracefully shutting down
2021/02/10 09:40:23 [notice] 2760#2760: exiting
2021/02/10 09:40:23 [notice] 2766#2766: gracefully shutting down
2021/02/10 09:40:23 [notice] 2762#2762: exiting
2021/02/10 09:40:23 [notice] 2766#2766: exiting
2021/02/10 09:40:23 [notice] 2759#2759: gracefully shutting down
2021/02/10 09:40:23 [notice] 2759#2759: exiting
2021/02/10 09:40:23 [notice] 2763#2763: gracefully shutting down
2021/02/10 09:40:23 [notice] 2761#2761: gracefully shutting down
2021/02/10 09:40:23 [notice] 2767#2767: gracefully shutting down
2021/02/10 09:40:23 [notice] 2763#2763: exiting
2021/02/10 09:40:23 [notice] 2767#2767: exiting
2021/02/10 09:40:23 [notice] 2761#2761: exiting
2021/02/10 09:40:23 [notice] 2760#2760: exit
2021/02/10 09:40:23 [notice] 2753#2753: exit
2021/02/10 09:40:23 [notice] 2766#2766: exit
2021/02/10 09:40:23 [notice] 2764#2764: gracefully shutting down
2021/02/10 09:40:23 [notice] 2764#2764: exiting
2021/02/10 09:40:23 [notice] 2752#2752: gracefully shutting down
2021/02/10 09:40:23 [notice] 2752#2752: exiting
2021/02/10 09:40:23 [notice] 2763#2763: exit
2021/02/10 09:40:23 [notice] 2762#2762: exit
2021/02/10 09:40:23 [notice] 2764#2764: exit
2021/02/10 09:40:23 [notice] 2759#2759: exit
2021/02/10 09:40:23 [notice] 2755#2755: exit
2021/02/10 09:40:23 [notice] 2752#2752: exit
2021/02/10 09:40:23 [notice] 2767#2767: exit
2021/02/10 09:40:23 [notice] 2761#2761: exit
2021/02/10 09:40:23 [notice] 2758#2758: gracefully shutting down
2021/02/10 09:40:23 [notice] 2758#2758: exiting
2021/02/10 09:40:23 [notice] 2756#2756: gracefully shutting down
2021/02/10 09:40:23 [notice] 2756#2756: exiting
2021/02/10 09:40:23 [notice] 2758#2758: exit
2021/02/10 09:40:23 [notice] 2756#2756: exit
2021/02/10 09:40:23 [notice] 2765#2765: gracefully shutting down
2021/02/10 09:40:23 [notice] 2765#2765: exiting
2021/02/10 09:40:23 [notice] 2757#2757: gracefully shutting down
2021/02/10 09:40:23 [notice] 2757#2757: exiting
2021/02/10 09:40:23 [notice] 2754#2754: gracefully shutting down
2021/02/10 09:40:23 [notice] 2754#2754: exiting
2021/02/10 09:40:23 [notice] 2754#2754: exit
2021/02/10 09:40:23 [notice] 2765#2765: exit
2021/02/10 09:40:23 [notice] 2757#2757: exit
I0210 09:40:23.604803       1 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"xxxx", Name:"xxxx", UID:"82a71705-194e-4919-a7e2-a511d52c1a7a", APIVersion:"networking.k8s.io/v1beta1", ResourceVersion:"77919848", FieldPath:""}): type: 'Normal' reason: 'AddedOrUpdated' Configuration for xxxx/xxxx was added or updated 
I0210 09:40:23.604873       1 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"xxxx", Name:"xxxx", UID:"10246997-07ae-41e1-b811-0ec630647f3b", APIVersion:"networking.k8s.io/v1beta1", ResourceVersion:"182677830", FieldPath:""}): type: 'Normal' reason: 'AddedOrUpdated' Configuration for xxxx/xxxx was added or updated 
I0210 09:40:23.605520       1 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"xxxx", Name:"xxxx", UID:"d628825f-1b06-4719-b4b0-4d971b8c0a54", APIVersion:"networking.k8s.io/v1beta1", ResourceVersion:"182677778", FieldPath:""}): type: 'Normal' reason: 'AddedOrUpdated' Configuration for xxxx/xxxx was added or updated 
I0210 09:40:23.605557       1 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"xxxx", Name:"xxxx", UID:"4b7b1fa1-1d7d-41a5-9d97-5f5aee52ade7", APIVersion:"networking.k8s.io/v1beta1", ResourceVersion:"182678922", FieldPath:""}): type: 'Normal' reason: 'AddedOrUpdated' Configuration for xxxx/xxxx was added or updated 
I0210 09:40:23.605569       1 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"xxxx", Name:"xxxx", UID:"b86b8b8e-82b9-40d0-b02d-073db557c0e1", APIVersion:"networking.k8s.io/v1beta1", ResourceVersion:"182678955", FieldPath:""}): type: 'Normal' reason: 'AddedOrUpdated' Configuration for xxxx/xxxx was added or updated 
I0210 09:40:23.605577       1 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"xxxx", Name:"xxxx", UID:"585ccdee-9807-442e-9b4f-7d1a97264216", APIVersion:"networking.k8s.io/v1beta1", ResourceVersion:"182677754", FieldPath:""}): type: 'Normal' reason: 'AddedOrUpdated' Configuration for xxxx/xxxx was added or updated 
W0210 09:40:23.614001       1 listers.go:79] can not retrieve list of objects using index : Index with name namespace does not exist
W0210 09:40:23.614213       1 listers.go:79] can not retrieve list of objects using index : Index with name namespace does not exist
W0210 09:40:23.614304       1 listers.go:79] can not retrieve list of objects using index : Index with name namespace does not exist
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2763
2021/02/10 09:40:23 [notice] 19#19: worker process 2755 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: worker process 2763 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: worker process 2767 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2766
2021/02/10 09:40:23 [notice] 19#19: worker process 2752 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: worker process 2753 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: worker process 2766 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2756
2021/02/10 09:40:23 [notice] 19#19: worker process 2756 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: worker process 2758 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: worker process 2759 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: worker process 2760 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: worker process 2761 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: worker process 2762 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: worker process 2764 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2754
2021/02/10 09:40:23 [notice] 19#19: worker process 2754 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2765
2021/02/10 09:40:23 [notice] 19#19: worker process 2765 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received

This seem to be looping forever and very quickly, on all the pods. I deployed my controller using these manifests and recreated the default server secret as mentioned in the release note.

The controller arguments are:

args:
  - -nginx-configmaps=$(POD_NAMESPACE)/nginx-config
  - -default-server-tls-secret=$(POD_NAMESPACE)/default-server-secret
  - -global-configuration=$(POD_NAMESPACE)/nginx-configuration
  - -report-ingress-status
  - -enable-prometheus-metrics
  - -enable-snippets

And here is the content of my nginx-config CM:

data:
  client-max-body-size: 50m
  proxy-read-timeout: 5m
  server-tokens: "False"

Any idea what is happening there and how to solve this issue?

Edit:

After some more research I found out that two of my ingresses are constantly being updated:

Name:             xxxx
Namespace:        xxxx
Address:          
Default backend:  default-http-backend:80 (<none>)
TLS:
  xxxx terminates xxxx
Rules:
  Host  Path  Backends
  ----  ----  --------
  *     *     default-http-backend:80 (<none>)
Annotations:
  ingress.kubernetes.io/ssl-redirect:                true
  kubectl.kubernetes.io/last-applied-configuration:  {"apiVersion":"extensions/v1beta1","kind":"Ingress","metadata":{"annotations":{"ingress.kubernetes.io/ssl-redirect":"true","kubernetes.io/ingress.class":"nginx","nginx.org/mergeable-ingress-type":"master"},"labels":{"app.kubernetes.io/component":"xxxx","app.kubernetes.io/instance":"xxxx","app.kubernetes.io/name":"xxxx","app.kubernetes.io/part-of":"xxxx","argocd.argoproj.io/instance":"xxxx"},"name":"xxxx","namespace":"xxxx"},"spec":{"rules":[{"host":"xxxx"}],"tls":[{"hosts":["xxxx"],"secretName":"xxxx"}]}}

  kubernetes.io/ingress.class:       nginx
  nginx.org/mergeable-ingress-type:  master
Events:
  Type    Reason          Age                       From                      Message
  ----    ------          ----                      ----                      -------
  Normal  AddedOrUpdated  3m5s (x2600127 over 6d)   nginx-ingress-controller  Configuration for xxxx/xxxx was added or updated
  Normal  AddedOrUpdated  2m12s (x2599793 over 6d)  nginx-ingress-controller  Configuration for xxxx/xxxx was added or updated
  Normal  AddedOrUpdated  66s (x2600182 over 6d)    nginx-ingress-controller  Configuration for xxxx/xxxx was added or updated
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    ingress.kubernetes.io/ssl-redirect: "true"
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"extensions/v1beta1","kind":"Ingress","metadata":{"annotations":{"ingress.kubernetes.io/ssl-redirect":"true","kubernetes.io/ingress.class":"nginx","nginx.org/mergeable-ingress-type":"master"},"labels":{"app.kubernetes.io/component":"xxxx","app.kubernetes.io/instance":"xxxx","app.kubernetes.io/name":"xxxx","app.kubernetes.io/part-of":"xxxx","argocd.argoproj.io/instance":"xxxx"},"name":"xxxx","namespace":"xxxx"},"spec":{"rules":[{"host":"xxxx"}],"tls":[{"hosts":["xxxx"],"secretName":"xxxx"}]}}
    kubernetes.io/ingress.class: nginx
    nginx.org/mergeable-ingress-type: master
  creationTimestamp: "2021-01-18T09:55:07Z"
  generation: 1
  labels:
    app.kubernetes.io/component: xxxx
    app.kubernetes.io/instance: xxxx
    app.kubernetes.io/name: xxxx
    app.kubernetes.io/part-of: xxxx
    argocd.argoproj.io/instance: xxxx
  name: xxxx
  namespace: xxxx
  resourceVersion: "182677754"
  selfLink: /apis/extensions/v1beta1/namespaces/xxxx/ingresses/xxxx
  uid: 585ccdee-9807-442e-9b4f-7d1a97264216
spec:
  rules:
  - host: xxxx
  tls:
  - hosts:
    - xxxx
    secretName: xxxx
status:
  loadBalancer:
    ingress:
    - {}

My environment is managed by ArgoCD but after checking the logs it doesn't look like the updates are coming from ArgoCD. I wonder if the updates are related to the -report-ingress-status option.

Edit II: I removed the -report-ingress-status and it didn't change anything.


Solution

  • I don't know the actual root cause but I deleted all the TLS secrets, certificates and ingresses that were being constantly being updated and recreated them. It solved this issue.

    Different incidents happened prior to this issue and might have been related to it: 2 of my 3 ingress nodes failed, during the upgrade the wrong CRDs were applied before being quickly fixed.

    That's all I can say at the moment, but deleting the resources related to the ingresses being constantly updated and recreating them do solve the issue.