I have some algorithms/methods which I'm trying to evaluate by running hundreds of thousands of test cases. Each test case has some meta-data associated with it. At the moment I'm pickling the meta-data for each test case on disk in a directory hierarchy. I'm running the test cases in parallel across 48 nodes. However I'm finding that I'm often I/O rather than CPU bound. I imagine that 48 concurrent disk accesses might be causing some issues. I wonder if I should store the meta-data in a centralised key-value store (or something similar) which would serialise the disk access (and/or cache the values). I don't have much experience with either SQL or noSQL databases.
Have you considered using redis? You'd be able to load all the pickled objects at once, then have quick, random access until shutdown.