I have a list containing millions of small records as dicts. Instead of serialising the entire thing to a single file as JSON, I would like to write each record to a separate file. Later I need to reconstitute the list from JSON deserialised from the files.
My goal isn't really minimising I/O so much as a general strategy for serialising individual collection elements to separate files concurrently or asynchronously. What's the most efficient way to accomplish this in either Python 3.x or similar high-level language?
For those looking for a modern Python-based solution supporting async/await, I found this neat package which does exactly what I'm looking for: https://pypi.org/project/aiofiles/. Specifically, I can do
import aiofiles, json
"""" A generator that reads and parses JSON from a list of files asynchronously."""
async json_reader(files: Iterable):
async for file in files:
async with aiofiles.open(file) as f:
data = await f.readlines()
yield json.loads(data)