docker kubernetes google-cloud-platform kubernetes-helm fastapi

Why am I getting a 502 error for my Fastapi app deployed to K8s?

I'm having trouble deploying my fastapi app to a k8s container in GCP. Even though I have green checkmark indicating its up and running my log shows this:

When I run my app locally, it builds. When I build my image and container locally, Docker desktop shows no issues and its not hitting this alive endpoint over and over. Before I added this endpoint to my app. The logs were showing the app stopping and restarting again and again.

Other endpoints that are deployed don't have any of these logs. So I feel like something keeps making the app restart over and over and checking this health liveness endpoint, what am I doing wrong here?

Dockerfile:

FROM python:3.9

#need to run virtualenv
RUN python3 -m venv /opt/venv



# Install system dependencies
RUN apt-get update

ENV PATH="${PATH}:/root/.poetry/bin"

WORKDIR .app work directory for code

ARG PIPCONF_B64
RUN mkdir -p ~/.pip && echo $PIPCONF_B64 | base64 -d > ~/.pip/pip.conf
RUN pip install poetry
# Copy over the requirements.txt so we can install it Copy project files and test files
COPY . app
COPY requirements.txt /requirements.txt
COPY poetry.lock pyproject.toml
COPY pyproject.toml pyproject.toml
RUN poetry export -f requirements.txt --output requirements.txt --without-hashes

# Install requirements
RUN pip3 install --upgrade pip
RUN pip3 install -r /requirements.txt
RUN . /opt/venv/bin/activate && pip install -r requirements.txt


ENV PYTHONPATH="${PYTHONPATH}:/appstuff"

EXPOSE 80
CMD ["uvicorn", "main:app", "--proxy-headers", "--host", "0.0.0.0", "--port", "80"]

my deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: "{{ .Release.Name }}-deployment"
spec:
  revisionHistoryLimit: 5
  {{- if not .Values.hpa.enabled }}
  replicas: {{ .Values.replicas }}
  {{ end }}
  selector:
    matchLabels:
      app: "{{ .Release.Name }}"
{{ toYaml .Values.labels | indent 6 }}
  template:
    metadata:
      annotations:
        ad.datadoghq.com/postgres.logs: '[{"source": ...}]'
      labels:
        app: "{{ .Release.Name }}"
{{ toYaml .Values.labels | indent 8 }}
    spec:
      serviceAccountName: "{{ .Release.Name }}-sa"
      containers:
        - name: "{{ .Release.Name }}"
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
          imagePullPolicy: "{{ .Values.image.pullPolicy }}"
          command: ["uvicorn"]
          args: ["main:app", "--proxy-headers", "--host", "0.0.0.0", "--port", "80"]
          ports:
            - name: ui
              containerPort: 80
              protocol: TCP
          env:

          envFrom:
            - configMapRef:
                name: "{{ .Release.Name }}-configmap"
#            - secretRef:
#                name: my-secret-name
          livenessProbe:
            httpGet:
              path: /alive
              port: 80
            initialDelaySeconds: 120
          resources:
{{ toYaml .Values.resources | indent 12 }}
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 33%
      maxUnavailable: 33%

my config.yaml:

apiVersion: v1
kind: ConfigMap
metadata:
  name: "{{ .Release.Name }}-configmap"
  annotations:
    meta.helm.sh/release-name: "{{ .Release.Name }}"
    meta.helm.sh/release-namespace: "{{ .Release.Namespace }}"
  labels:
    app.kubernetes.io/managed-by: Helm
data:
{{ toYaml .Values.envConfig | indent 2 }}

Solution

5xx errors such as 502 occur when the server is unable to process or serve the client’s request. This blog written by Nir Shtein briefs about various points of failures, which can cause 5xx errors when an application is deployed on a kubernetes cluster.

Consider a typical scenario in which you map a Service to a container within a pod, and the client is attempting to access an application running on that container. This creates several points of failure:

The pod

The container

Network ports exposed on the container

The Service

The Ingress

As per the description provided by you, the main reason for getting 502 errors is that your containers are restarting continuously.

A pod or container restart might occur because of reasons like over resource utilization(CPU or memory overshoot) or because of the pod shutting down prematurely due to an error in application code, follow this blog for more information regarding container restarts.

Troubleshooting steps:

Check the pod logs for more information, on why the pod is getting restarted, you can use the below commands for this

 `kubectl get logs <pod>` or `kubectl describe pod <pod>`

Pod restarts might also happen due to high CPU or Memory utilization. Use the below command for checking the pod that is consuming higher resources

 `kubectl top pods`

Check if your application pods are showing in the list and login to the pod with high memory utilization and check the processes that are consuming more resources.

Sometimes improper auto-scaler configurations will also cause container or pod restarts, check your auto-scaler config and correct any misconfiguration found.

Here is an additional reference on Pod restarts, go through this for additional information and debugging steps.