amazon-web-services docker symfony amazon-ecs messenger

Symfony Messenger not shutting down gracefully when using APP_ENV=prod

We are using Symfony Messenger in combination with supervisor running in a Docker container on AWS ECS. We noticed the worker is not shut down gracefully. After debugging it appears it does work as expected when using APP_ENV=dev, but not when APP_ENV=prod.

I made a simple sleepMessage, which sleeps for 1 second and then prints a message for 60 seconds. This is when running with APP_ENV=dev

As you can see it's clearly waiting for the program to stop running. Now with APP_ENV=prod:

It stops immediately without waiting.

In the Dockerfile we have configured the following to start supervisor. It's based on php:8.1-apache, so that's why STOPSIGNAL has been configured

RUN apt-get update && apt-get install -y --no-install-recommends \
    # for supervisor
    python \
    supervisor

The start-worker.sh script contains this

#!/usr/bin/env bash

cp config/worker/messenger-worker.conf ../../../etc/supervisor/supervisord.conf
exec /usr/bin/supervisord

We do this because certain env variables are only available when starting up. For debugging purposes the config has been hardcoded to test. Below is the messenger-worker.conf

[unix_http_server]
file=/tmp/supervisor.sock

[supervisord]
nodaemon=true               ; start in foreground if true; default false

[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface

[program:messenger-consume]
stderr_logfile_maxbytes=0
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
command=bin/console messenger:consume async -vv --env=prod --time-limit=129600
process_name=%(program_name)s_%(process_num)02d
autostart=true
autorestart=true
numprocs=1
environment=
    MESSENGER_TRANSPORT_DSN="https://sqs.eu-central-1.amazonaws.com/{id}/dev- 
    symfony-messenger-queue"

So in short, when using --env=prod in the config above it doesn't wait for the worker to stop, while with --env=dev it does. Does anybody know how to solve this?

Solution

Turns out it was related to the wait_time option related to SQS transports. It probably caused a request that was started just before the container exited and was sent back when the container did not exist anymore. So, wait_time to 0 fixed that problem.

Then there was this which could lead to the same issue