I'm using Celery with a Redis broker to do some "heavy" processing for my Django app. Everything is running locally in Docker containers on WSL2.
The tasks output a JSON which is roughly 2.5 Mb large and it takes up to 9 seconds to retrieve the result via get()
in the Django app. For smaller payloads, the time goes down
I tried increasing the RAM and the CPU for WSL2 up to 6 CPUs and 8Gb RAM. Celery was configured with --max-memory-per-child=1024000 --concurrency=4
I've tried using different result_backend configuration with similar results:
I tried setting an interval when using SQLite (doesn't matter for RPC & Redis) with a 0.5sec improvement get(interval=0.01)
I also tried changing the result_serializer from JSON to pickle for poorer performance. But I don't think the serializer is the culprit here as serializing / deserializing the same JSON is pretty fast in console
>>> timeit.timeit(lambda: pickle.dumps(big_dict,0), number=10)
0.567067899999528
>>> timeit.timeit(lambda: pickle.loads(str), number=10)
0.3542163999991317
I tried using compression, only zlib seemed to provide a small gain.
I'm not too familiar with this setup but IMHO I should be able to retrieve results faster. The best I could achieve was 6sec. Any idea how to improve this or how to explain it ?
settings.py
CELERY_BROKER_URL = "redis://{host}:{port}/{db}".format(
host=os.environ.get('REDIS_HOST'),
port=os.environ.get('REDIS_PORT'),
db=os.environ.get('CELERY_REDIS_DB')
)
CELERY_RESULT_BACKEND = "redis://{host}:{port}/{db}".format(
host=os.environ.get('REDIS_HOST'),
port=os.environ.get('REDIS_PORT'),
db=os.environ.get('CELERY_REDIS_DB')
)
# CELERY_RESULT_BACKEND = 'db+sqlite:///celery.sqlite' # SQL Example (need SQLAlchemy==1.4.29 in requirements.txt)
# CELERY_RESULT_BACKEND = 'rpc://localhost' # RPC Example
CELERY_ACCEPT_CONTENT = ['json']
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
Thanks
In general, Redis has a reputation for being bad at dealing with large objects and is not generally intended to be a large object store. You're better off using a general purpose RDBMS or a file store and returning a key to where the JSON can be retrieved.