Search code examples
pythondockerdocker-composeceleryout-of-memory

Celery worker container never restarts after getting MemoryError


I am getting this annoying issue on my production machine, where I have a docker container for my celery container, configured like this:

worker:
    build: .
    env_file:
      - .env
    command: celery -A my_app worker --loglevel=info --concurrency 1 -E
    deploy:
      restart_policy:
        condition: on-failure
        delay: 5s
        max_attempts: 3
        window: 120s
    depends_on:
      - api

My issue is that sadly, this worker often goes out of memory (despite setting worker_max_tasks_per_child = 1000), throwing this error:

[2024-03-19 17:23:24,533: CRITICAL/MainProcess] Unrecoverable error: MemoryError()
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/celery/worker/worker.py", line 203, in start
  File "/usr/local/lib/python3.10/site-packages/celery/bootsteps.py", line 116, in start
  File "/usr/local/lib/python3.10/site-packages/celery/bootsteps.py", line 365, in start
  File "/usr/local/lib/python3.10/site-packages/celery/worker/consumer/consumer.py", line 332, in start
  File "/usr/local/lib/python3.10/site-packages/celery/bootsteps.py", line 116, in start
    step.start(parent)
  File "/usr/local/lib/python3.10/site-packages/celery/worker/consumer/consumer.py", line 628, in start
    c.loop(*c.loop_args())
  File "/usr/local/lib/python3.10/site-packages/celery/worker/loops.py", line 97, in asynloop
    next(loop)
  File "/usr/local/lib/python3.10/site-packages/kombu/asynchronous/hub.py", line 362, in create_loop
  File "/usr/local/lib/python3.10/site-packages/kombu/transport/redis.py", line 1326, in on_readable
  File "/usr/local/lib/python3.10/site-packages/kombu/transport/redis.py", line 562, in on_readable
  File "/usr/local/lib/python3.10/site-packages/kombu/transport/redis.py", line 955, in _brpop_read
  File "/usr/local/lib/python3.10/site-packages/redis/client.py", line 1275, in parse_response
  File "/usr/local/lib/python3.10/site-packages/redis/connection.py", line 865, in read_response
  File "/usr/local/lib/python3.10/site-packages/redis/connection.py", line 346, in read_response
  File "/usr/local/lib/python3.10/site-packages/redis/connection.py", line 356, in _read_response
  File "/usr/local/lib/python3.10/site-packages/redis/connection.py", line 259, in readline
  File "/usr/local/lib/python3.10/site-packages/redis/connection.py", line 209, in _read_from_socket
MemoryError

And the REAL BIG issue for me right now, is that the docker container never restarts after hitting this crash, and I have no idea why! My django container also gets memory errors and seems to restart fine (with the exact same restart_policy), but not this one...

I tried setting the container's mem_limit to some arbitrary value (1/4 of the host's RAM), as I read that maybe the MemoryError could be stopping the container without any error... (but as I said, I already received MemoryErrors on my django container and it restarted fine), to no avail.


Solution

  • It was actually a bug from celery, upgrading it to version 5.3.6 fixed my issue.