Search code examples
pythonpython-multiprocessingjsonlines

Python: How to write jsonline without overwriting?


I have a piece of code, it process thousands of files in a directory, for each file, it generate an object (dictionary) with part of its key-value as:

{
    ........
    'result': [...a very long list...]
}

if I process all the files, save result in a list then use jsonlines library to write all, my laptop (mac) will run out of memory.

So my solution will be process one by one, and get result, then insert into the jsonline file and delete the object and release memory.

After check the official document: https://jsonlines.readthedocs.io/en/latest/

I couldn't find a method which can write without overwriting the jsonline file.

So how I can handle such big output.

Besides, I'm using parallel threads to process result:

from multiprocessing.dummy import Pool
Pool(4).map(get_result, file_lst)

I do hope to open the json_file, write each result and then release the memory.


Solution

  • If I understands your question correctly, I think this will solve it:

    with jsonlines.open('yourTextFile', mode='a') as writer:
        writer.write(...)
    

    As you mentioned you are overwriting the file, I think this is because you use mode='w' (w = writing) instead of using mode='a' (a = appending)