I'm using pulumi to manage kubernetes deployments. One of the deployments runs an image which intercepts SIGINT and SIGTERM signals to perform a graceful shutdown like so (this example is running in my IDE):
{"level":"info","msg":"trying to activate gcloud service account","time":"2021-06-17T12:19:25-05:00"}
{"level":"info","msg":"env var not found","time":"2021-06-17T12:19:25-05:00"}
{"Namespace":"default","TaskQueue":"main-task-queue","WorkerID":"37574@Paymahns-Air@","level":"error","msg":"Started Worker","time":"2021-06-17T12:19:25-05:00"}
{"Namespace":"default","Signal":"interrupt","TaskQueue":"main-task-queue","WorkerID":"37574@Paymahns-Air@","level":"error","msg":"Worker has been stopped.","time":"2021-06-17T12:19:27-05:00"}
{"Namespace":"default","TaskQueue":"main-task-queue","WorkerID":"37574@Paymahns-Air@","level":"error","msg":"Stopped Worker","time":"2021-06-17T12:19:27-05:00"}
Notice the "Signal":"interrupt"
with a message of Worker has been stopped
.
I find that when I alter the source code (which alters the docker image) and run pulumi up
the pod doesn't gracefully terminate based on what's described in this blog post. Here's a screenshot of logs from GCP:
The highlighted log line in the image above is the first log line emitted by the app. Note that the shutdown messages aren't logged above the highlighted line which suggests to me that the pod isn't given a chance to perform a graceful shutdown.
Why might the pod not go through the graceful shutdown mechanisms that kubernetes offers? Could this be a bug with how pulumi
performs updates to deployments?
EDIT: after doing more investigation I found that this problem is happening because starting a docker container with go run /path/to/main.go
actually ends up created two processes like so (after exec
ing into the container):
root@worker-ffzpxpdm-78b9797dcd-xsfwr:/gadic# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.3 0.3 2046200 30828 ? Ssl 18:04 0:12 go run src/cmd/worker/main.go --temporal-host temporal-server.temporal.svc.cluster.local --temporal-port 7233 --grpc-port 6789 --grpc-hos
root 3782 0.0 0.5 1640772 43232 ? Sl 18:06 0:00 /tmp/go-build2661472711/b001/exe/main --temporal-host temporal-server.temporal.svc.cluster.local --temporal-port 7233 --grpc-port 6789 --
root 3808 0.1 0.0 4244 3468 pts/0 Ss 19:07 0:00 /bin/bash
root 3817 0.0 0.0 5900 2792 pts/0 R+ 19:07 0:00 ps aux
If run kill -TERM 1
then the signal isn't forwarded to the underlying binary, /tmp/go-build2661472711/b001/exe/main
, which means the graceful shutdown of the application isn't executed. However, if I run kill -TERM 3782
then the graceful shutdown logic is executed.
It seems the go run
spawns a subprocess and this blog post suggests the signals are only forwarded to PID 1. On top of that, it's unfortunate that go run
doesn't forward signals to the subprocess it spawns.
The solution I found is to add RUN go build -o worker /path/to/main.go
in my dockerfile and then to start the docker container with ./worker --arg1 --arg2
instead of go run /path/to/main.go --arg1 --arg2
.
Doing it this way ensures there aren't any subprocess spawns by go
and that ensures signals are handled properly within the docker container.