Search code examples
pythonceleryfileparsing

Share objects between celery tasks


I have got a program that handle about 500 000 files {Ai} and for each file, it will fetch a definition {Di} for the parsing.

For now, each file {Ai} is parsed by a dedicated celery task and each time the definition file {Di} is parsed again to generate an object. This object is used for the parsing of the file {Ai} (JSON representation).

I would like to store the definition file (generated object) {Di(object)} to make it available for whole task.

So I wonder what would be the best choice to manage it:

  1. Memcahe + Python-memcached,
  2. A Long running task to "store" the object with set(add)/get interface.

For performance and memory usage, what would be the best choice ?


Solution

  • Using Memcached sounds like a much easier solution - a task is for processing, memcached is for storage - why use a task for storage?

    Personally I'd recommend using Redis over memcached.

    An alternative would be to try ZODB - it stores Python objects natively. If your application really suffers from serialization overhead maybe this would help. But I'd strongly recommend testing this with your real workload against JSON/memcached.