Search code examples
pythonparallel-processingkey-value-store

Any value in moving from pickling many objects to key-value store?


I have some algorithms/methods which I'm trying to evaluate by running hundreds of thousands of test cases. Each test case has some meta-data associated with it. At the moment I'm pickling the meta-data for each test case on disk in a directory hierarchy. I'm running the test cases in parallel across 48 nodes. However I'm finding that I'm often I/O rather than CPU bound. I imagine that 48 concurrent disk accesses might be causing some issues. I wonder if I should store the meta-data in a centralised key-value store (or something similar) which would serialise the disk access (and/or cache the values). I don't have much experience with either SQL or noSQL databases.


Solution

  • Have you considered using redis? You'd be able to load all the pickled objects at once, then have quick, random access until shutdown.

    get object from redis without eval?