I have got a program that handle about 500 000 files {Ai} and for each file, it will fetch a definition {Di} for the parsing.
For now, each file {Ai} is parsed by a dedicated celery task and each time the definition file {Di} is parsed again to generate an object. This object is used for the parsing of the file {Ai} (JSON representation).
I would like to store the definition file (generated object) {Di(object)} to make it available for whole task.
So I wonder what would be the best choice to manage it:
For performance and memory usage, what would be the best choice ?
Using Memcached sounds like a much easier solution - a task is for processing, memcached is for storage - why use a task for storage?
Personally I'd recommend using Redis over memcached.
An alternative would be to try ZODB - it stores Python objects natively. If your application really suffers from serialization overhead maybe this would help. But I'd strongly recommend testing this with your real workload against JSON/memcached.