SIGTERM not sent on pod delete

When deleting a pod or deploying a new version of a pod kubernetes should theoretically send a SIGTERM to the process running and then wait gracePeriodSeconds (30 by default) seconds before it sends a SIGKILL.

I have encountered the problem that this first SIGTERM never seems to be sent. The default settings in my cluster were never changed (kill is sent as expected after 30 seconds), so my assumption is that there might be something wrong, permissions or similar, with my Dockerfile (see below).

I've excluded there being an error in the graceful shutdown logic catching the SIGTERM in the executable by kubectl exec-ing into the pod and using kill -15 on the process which works as expected.

The Dockerfile looks as follows:

FROM debian:bullseye-slim AS app

ARG USERNAME=app
ARG USER_UID=1000
ARG USER_GID=$USER_UID
RUN apt update && apt install -y libssl-dev zstd ca-certificates pkg-config

RUN groupadd --gid $USER_GID $USERNAME \
    && useradd --uid $USER_UID --gid $USER_GID -m $USERNAME
WORKDIR /home/$USERNAME

ARG RELEASE_DIR
ARG SERVICE 

USER $USERNAME

COPY $RELEASE_DIR .

EXPOSE 8080

ENV CMD=./${SERVICE}
CMD ${CMD}

Is there something blatantly wrong here? Or does kubernetes require some additional config to actually send the termination signal as expected?

Solution

For termination to work correctly, you need to ensure your application is the main container process. With the shell form CMD, the command your container runs is /bin/sh -c '${CMD}', and depending on what's in that environment variable and what /bin/sh actually is, that shell wrapper might keep running as the main container process and intercept the termination signal.

The same mechanisms apply in both plain Docker and Kubernetes and you should see a similar issue if you docker stop the container locally. This may be easier to debug and iterate on.

The easiest way to work around this is to use the exec form of CMD that looks like a JSON array. Since this won't run a shell, it also can't do variable expansions, and you'll have to spell out what you want the command to actually be

CMD ["./service"]

This is still easy to override at runtime, and you in fact don't need that CMD at all:

# instead of `docker run -e CMD='...'`
docker run --rm my-image \
  ls -l /home/app

# or in a Kubernetes pod spec
command:
  - /home/app/another_app
args:
  - --option

You can probably similarly remove pretty much all of the ARG declarations in the Dockerfile (the name or numeric uid of the container user shouldn't matter, for example, and the compiled application filename and host build path are usually fixed) which will simplify the setup.