Search code examples
kubernetesgoogle-cloud-platformgoogle-kubernetes-enginekubernetes-ingresshttp-response-codes

GKE basic-ingress intermittently returns 502 when backend returns 404/422


I have an ingress providing routing for two microservices running on GKE, and intermittently when the microservice returns a 404/422, the ingress returns a 502.

Here is my ingress definition:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: basic-ingress
  annotations:
    kubernetes.io/ingress.global-static-ip-name: develop-static-ip
    ingress.gcp.kubernetes.io/pre-shared-cert: dev-ssl-cert
spec:
  rules:
  - http:
      paths:
      - path: /*
        backend:
          serviceName: srv
          servicePort: 80
      - path: /c/*
        backend:
          serviceName: collection
          servicePort: 80
      - path: /w/*
        backend:
          serviceName: collection
          servicePort: 80

I run tests that hit the srv back-end where I expect a 404 or 422 response. I have verified when I hit the srv back-end directly (bypassing the ingress) that the service responds correctly with the 404/422.

When I issue the same requests through the ingress, the ingress will intermittently respond with a 502 instead of the 404/422 coming from the back-end.

How can I have the ingress just return the 404/422 response from the back-end?

Here's some example code to demonstrate the behavior I'm seeing (the expected status is 404):

>>> for i in range(10):
        resp = requests.get('https://<server>/a/v0.11/accounts/junk', cookies=<token>)
        print(resp.status_code)

502
502
404
502
502
404
404
502
404
404

And here's the same requests issued from a python prompt within the pod, i.e. bypassing the ingress:

>>> for i in range(10):
...     resp = requests.get('http://0.0.0.0/a/v0.11/accounts/junk', cookies=<token>)
...     print(resp.status_code)
...
404
404
404
404
404
404
404
404
404
404

Here's the output of the kubectl commands to demonstrate that the loadbalancer is set up correctly (I never get a 502 for a 2xx/3xx response from the microservice):

$ kubectl get pods -o wide
NAME                          READY   STATUS    RESTARTS   AGE   IP          NODE                                     NOMINATED NODE   READINESS GATES
srv-799976fbcb-4dxs7          2/2     Running   0          19m   10.24.3.8   gke-develop-default-pool-ea507abc-43h7   <none>           <none>
srv-799976fbcb-5lh9m          2/2     Running   0          19m   10.24.1.7   gke-develop-default-pool-ea507abc-q0j3   <none>           <none>
srv-799976fbcb-5zvmv          2/2     Running   0          19m   10.24.2.9   gke-develop-default-pool-ea507abc-jjzg   <none>           <none>
collection-5d9f8586d8-4zngz   2/2     Running   0          19m   10.24.1.6   gke-develop-default-pool-ea507abc-q0j3   <none>           <none>
collection-5d9f8586d8-cxvgb   2/2     Running   0          19m   10.24.2.7   gke-develop-default-pool-ea507abc-jjzg   <none>           <none>
collection-5d9f8586d8-tzwjc   2/2     Running   0          19m   10.24.2.8   gke-develop-default-pool-ea507abc-jjzg   <none>           <none>
parser-7df86f57bb-9qzpn       1/1     Running   0          19m   10.24.0.8   gke-develop-parser-pool-5931b06f-6mcq    <none>           <none>
parser-7df86f57bb-g6d4q       1/1     Running   0          19m   10.24.5.5   gke-develop-parser-pool-5931b06f-9xd5    <none>           <none>
parser-7df86f57bb-jchjv       1/1     Running   0          19m   10.24.0.9   gke-develop-parser-pool-5931b06f-6mcq    <none>           <none>

$ kubectl get svc
NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)        AGE
srv          NodePort    10.0.2.110   <none>        80:30141/TCP   129d
collection   NodePort    10.0.4.237   <none>        80:30270/TCP   129d
kubernetes   ClusterIP   10.0.0.1     <none>        443/TCP        130d

$ kubectl get endpoints
NAME         ENDPOINTS                                AGE
srv          10.24.1.7:80,10.24.2.9:80,10.24.3.8:80   129d
collection   10.24.1.6:80,10.24.2.7:80,10.24.2.8:80   129d
kubernetes   35.237.239.186:443                       130d

Solution

  • tl;dr: GCP LoadBalancer/GKE Ingress will 502 if 404/422s from the back-ends don't have response bodies.

    Looking at the LoadBalancer logs, I would see the following errors:

    502: backend_connection_closed_before_data_sent_to_client
    404: backend_connection_closed_after_partial_response_sent
    

    Since everything was configured correctly (even the LoadBalancer said the backends were healthy)--backend was working as expected and no failed health checks--I experimented with a few things and noticed that all of my 404 responses had empty bodies.

    Sooo, I added a body to my 404 and 422 responses and lo and behold no more 502s!