I have a StatefulSet, Service with NEG, and Ingress set up on Google Cloud Kubernetes Engine cluster.
Every workload and network object is ready and healthy. Ingress is created and NEG status is updated for all the services. VPC-native (Alias-IP) and HTTP Load Balancer options are enabled for the cluster.
But when I try to access my application using a path specified in my Ingress I always get 502 (Bad Gateway) error.
Here is my configuration (names are redacted including image name):
apiVersion: v1
kind: Service
metadata:
annotations:
cloud.google.com/neg: '{"ingress": true}'
labels:
app: myapp
name: myapp
spec:
ports:
- port: 80
protocol: TCP
targetPort: tcp
selector:
app: myapp
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
app: myapp
name: myapp
spec:
replicas: 1
selector:
matchLabels:
app: myapp
serviceName: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
livenessProbe:
httpGet:
path: /
port: tcp
scheme: HTTP
initialDelaySeconds: 60
image: myapp:8bebbaf
ports:
- containerPort: 1880
name: tcp
protocol: TCP
readinessProbe:
failureThreshold: 1
httpGet:
path: /
port: tcp
scheme: HTTP
volumeMounts:
- mountPath: /data
name: data
securityContext:
fsGroup: 1000
terminationGracePeriodSeconds: 10
volumeClaimTemplates:
- metadata:
labels:
app: myapp
name: data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
---
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
name: myapp-ingress
spec:
rules:
- http:
paths:
- path: /workflow
backend:
serviceName: myapp
servicePort: 80
What's wrong with it and how can I fix it?
After much digging and tests I finally found what's wrong. Also, it seems like GKE NEG Ingress is not very stable (indeed NEG is in beta) and does not always conform to Kubernetes specs.
There was an issue with GKE Ingress related to named ports in targetPort
field. The fix is implemented and available from 1.16.0-gke.20 cluster version (Release), which as of today (February 2020) is available under Rapid Channel, but I have not tested the fix as I had other issues with an ingress on a version from this channel.
So basically there are 2 options if you experience the same issue:
Specify exact port number and not port name in a targetPort
field in your service. Here is a fixed service config file from my example:
apiVersion: v1
kind: Service
metadata:
annotations:
cloud.google.com/neg: '{"ingress": true}'
labels:
app: myapp
name: myapp
spec:
ports:
- port: 80
protocol: TCP
# !!!
# targetPort: tcp
targetPort: 1088
selector:
app: myapp
Upgrade GKE cluster to 1.16.0-gke.20+ version (haven't tested it myself).