I have a compose file with three services (database, backend and frontend). Backend depends on database being healthy, and frontend depends on backend being healthy.
Database (postgres) checks for its own health using pg_isready
and backend (FastAPI) checks for its health via an endpoint http://localhost:8080/healthcheck
Compose file:
version: '3'
services:
database:
image: postgres:14-alpine
healthcheck:
test: pg_isready -U postgres
interval: 1s
timeout: 5s
retries: 5
start_period: 10s
backend:
depends_on:
database:
condition: service_healthy
image: backend-api-image
build:
context: backend
dockerfile: Dockerfile
ports:
- "8080:8080"
volumes:
- './backend:/backend'
healthcheck:
test: wget --no-verbose --tries=1 --spider http://localhost:8080/healthcheck || exit 1
interval: 1s
timeout: 5s
frontend:
image: my-frontend
depends_on:
backend:
condition: service_healthy
build:
context: ./frontend
dockerfile: Dockerfile
FastAPI app
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
app = FastAPI()
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
@app.get('/healthcheck')
def get_healthcheck():
return 'OK'
So far this all works as expected. If, for example I were to have a typo in my healthcheck
endpoint route (in my app), startup would fail, like so:
database | 2023-06-01 23:01:44.410 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
database | 2023-06-01 23:01:44.410 UTC [1] LOG: listening on IPv6 address "::", port 5432
database | 2023-06-01 23:01:44.411 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
database | 2023-06-01 23:01:44.414 UTC [22] LOG: database system was shut down at 2023-06-01 22:51:10 UTC
database | 2023-06-01 23:01:44.417 UTC [1] LOG: database system is ready to accept connections
backend | INFO: Will watch for changes in these directories: ['/backend']
backend | INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
backend | INFO: Started reloader process [1] using StatReload
backend | INFO: Started server process [8]
backend | INFO: Waiting for application startup.
backend | INFO: Application startup complete.
backend | INFO: 127.0.0.1:41294 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend | INFO: 127.0.0.1:41296 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend | INFO: 127.0.0.1:41298 - "GET /healthcheck HTTP/1.1" 404 Not Found
dependency failed to start: container backend is unhealthy
Where I'm getting confused is, that after a successful startup, if I change the app in such a way to make backend
become unhealthy, the container would detect the change and the check would return a 404
(as expected) but it would never become unhealthy.
database | 2023-06-01 23:06:37.396 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
database | 2023-06-01 23:06:37.396 UTC [1] LOG: listening on IPv6 address "::", port 5432
database | 2023-06-01 23:06:37.397 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
database | 2023-06-01 23:06:37.400 UTC [22] LOG: database system was shut down at 2023-06-01 23:06:34 UTC
database | 2023-06-01 23:06:37.403 UTC [1] LOG: database system is ready to accept connections
backend | INFO: Will watch for changes in these directories: ['/backend']
backend | INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
backend | INFO: Started reloader process [1] using StatReload
backend | INFO: Started server process [9]
backend | INFO: Waiting for application startup.
backend | INFO: Application startup complete.
backend | INFO: 127.0.0.1:49450 - "GET /healthcheck HTTP/1.1" 200 OK
frontend |
frontend | > frontend@0.0.0 dev
frontend | > vite --host
frontend |
frontend | Forced re-optimization of dependencies
frontend |
frontend | VITE v4.3.1 ready in 285 ms
frontend |
frontend | ➜ Local: http://localhost:5173/
frontend | ➜ Network: http://172.26.0.4:5173/
backend | INFO: 127.0.0.1:57966 - "GET /healthcheck HTTP/1.1" 200 OK
backend | INFO: 127.0.0.1:57968 - "GET /healthcheck HTTP/1.1" 200 OK
backend | INFO: 127.0.0.1:57982 - "GET /healthcheck HTTP/1.1" 200 OK
backend | INFO: 127.0.0.1:57992 - "GET /healthcheck HTTP/1.1" 200 OK
backend | INFO: 127.0.0.1:58002 - "GET /healthcheck HTTP/1.1" 200 OK
backend | INFO: 127.0.0.1:58012 - "GET /healthcheck HTTP/1.1" 200 OK
backend | INFO: 127.0.0.1:58018 - "GET /healthcheck HTTP/1.1" 200 OK
backend | WARNING: StatReload detected changes in 'src/main.py'. Reloading...
backend | INFO: Shutting down
backend | INFO: Waiting for application shutdown.
backend | INFO: Application shutdown complete.
backend | INFO: Finished server process [9]
backend | INFO: Started server process [76]
backend | INFO: Waiting for application startup.
backend | INFO: Application startup complete.
backend | INFO: 127.0.0.1:58028 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend | INFO: 127.0.0.1:58040 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend | INFO: 127.0.0.1:35092 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend | INFO: 127.0.0.1:35098 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend | INFO: 127.0.0.1:35102 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend | INFO: 127.0.0.1:35116 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend | INFO: 127.0.0.1:35126 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend | INFO: 127.0.0.1:35134 - "GET /healthcheck HTTP/1.1" 404 Not Found
What I expected:
While running after a successful startup, upon changing the backend
code in such a way that its healthcheck would fail, I expected frontend
to exit or become degraded somehow, as its health dependency has failed.
What happened:
Everything kept running as if nothing happened, even though the backend
healthcheck returned a failing value.
My questions:
backend
container not being marked as unhealthy when changes cause its healthcheck to fail while running?kill 1
instead of exit 1
and that would cause backend
container to stop, but doesn't seem very clean.In trying to reproduce the behavior you've described, the first problem I ran into is that the standard version of wget
will make HEAD
requests when using the --spider
option, so that your healthcheck results in:
HEAD /healthcheck HTTP/1.1" 405 Method Not Allowed
This is using wget
version 1.21
as installed in the python:3.11
image. I modified the healthcheck to look like this (and dropped the irrelevant parts of your docker-compose.yaml
):
version: '3'
services:
backend:
image: backend-api-image
build:
context: backend
dockerfile: Dockerfile
ports:
- "8080:8080"
volumes:
- './backend:/backend'
healthcheck:
test: wget --no-verbose -O /dev/null --tries=1 http://localhost:8080/healthcheck || exit 1
interval: 1s
timeout: 5s
I have your example FastAPI code in backend/backend.py
, and my backend/Dockerfile
looks like:
FROM python:3.11
WORKDIR /app
RUN python3 -m venv .venv
ENV PATH=/app/.venv/bin:/usr/local/bin:/usr/bin:/bin
COPY requirements.txt ./
RUN . .venv/bin/activate && pip install -r requirements.txt
COPY . ./
CMD ["uvicorn", "--reload", "--host", "0.0.0.0", "--port", "8080", "backend:app"]
When I run docker-compose up
, I see:
backend_1 | INFO: 127.0.0.1:44856 - "GET /healthcheck HTTP/1.1" 200 OK
backend_1 | INFO: 127.0.0.1:44884 - "GET /healthcheck HTTP/1.1" 200 OK
...and the container enters the "healthy" state:
NAME IMAGE COMMAND SERVICE CREATED STATUS PORTS
webserver_backend_1 backend-api-image "uvicorn --reload --…" backend 24 seconds ago Up 23 seconds (healthy) 0.0.0.0:8080->8080/tcp, :::8080->8080/tcp
If I docker exec
into the container and modify the FastAPI application to return an error, so that the code looks like this:
backend_1 | WARNING: StatReload detected changes in 'backend.py'. Reloading...
backend_1 | INFO: Shutting down
backend_1 | INFO: Waiting for application shutdown.
backend_1 | INFO: Application shutdown complete.
backend_1 | INFO: Finished server process [8]
backend_1 | INFO: Started server process [1050]
backend_1 | INFO: Waiting for application startup.
backend_1 | INFO: Application startup complete.
backend_1 | INFO: 127.0.0.1:44618 - "GET /healthcheck HTTP/1.1" 400 Bad Request
backend_1 | INFO: 127.0.0.1:48912 - "GET /healthcheck HTTP/1.1" 400 Bad Request
And the container enters the "unhealthy" state:
NAME IMAGE COMMAND SERVICE CREATED STATUS PORTS
webserver_backend_1 backend-api-image "uvicorn --reload --…" backend 2 minutes ago Up 2 minutes (unhealthy) 0.0.0.0:8080->8080/tcp, :::8080->8080/tcp
That all seems to work as expected: the container health status changes as the response from the FastAPI service changes.
Here are some questions to help further diagnose things on your end:
What does the Dockerfile
for your FastAPI service look like? In particular, what's the base image?
Have you verified that the wget
command in that image returns an error code as expected for a non-200 response from the server?