I am getting this annoying issue on my production machine, where I have a docker container for my celery container, configured like this:
worker:
build: .
env_file:
- .env
command: celery -A my_app worker --loglevel=info --concurrency 1 -E
deploy:
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
window: 120s
depends_on:
- api
My issue is that sadly, this worker often goes out of memory (despite setting worker_max_tasks_per_child = 1000
), throwing this error:
[2024-03-19 17:23:24,533: CRITICAL/MainProcess] Unrecoverable error: MemoryError()
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/celery/worker/worker.py", line 203, in start
File "/usr/local/lib/python3.10/site-packages/celery/bootsteps.py", line 116, in start
File "/usr/local/lib/python3.10/site-packages/celery/bootsteps.py", line 365, in start
File "/usr/local/lib/python3.10/site-packages/celery/worker/consumer/consumer.py", line 332, in start
File "/usr/local/lib/python3.10/site-packages/celery/bootsteps.py", line 116, in start
step.start(parent)
File "/usr/local/lib/python3.10/site-packages/celery/worker/consumer/consumer.py", line 628, in start
c.loop(*c.loop_args())
File "/usr/local/lib/python3.10/site-packages/celery/worker/loops.py", line 97, in asynloop
next(loop)
File "/usr/local/lib/python3.10/site-packages/kombu/asynchronous/hub.py", line 362, in create_loop
File "/usr/local/lib/python3.10/site-packages/kombu/transport/redis.py", line 1326, in on_readable
File "/usr/local/lib/python3.10/site-packages/kombu/transport/redis.py", line 562, in on_readable
File "/usr/local/lib/python3.10/site-packages/kombu/transport/redis.py", line 955, in _brpop_read
File "/usr/local/lib/python3.10/site-packages/redis/client.py", line 1275, in parse_response
File "/usr/local/lib/python3.10/site-packages/redis/connection.py", line 865, in read_response
File "/usr/local/lib/python3.10/site-packages/redis/connection.py", line 346, in read_response
File "/usr/local/lib/python3.10/site-packages/redis/connection.py", line 356, in _read_response
File "/usr/local/lib/python3.10/site-packages/redis/connection.py", line 259, in readline
File "/usr/local/lib/python3.10/site-packages/redis/connection.py", line 209, in _read_from_socket
MemoryError
And the REAL BIG issue for me right now, is that the docker container never restarts after hitting this crash, and I have no idea why! My django container also gets memory errors and seems to restart fine (with the exact same restart_policy
), but not this one...
I tried setting the container's mem_limit
to some arbitrary value (1/4 of the host's RAM), as I read that maybe the MemoryError could be stopping the container without any error... (but as I said, I already received MemoryErrors on my django container and it restarted fine), to no avail.
It was actually a bug from celery, upgrading it to version 5.3.6 fixed my issue.