multiprocessing and write date to disk to reduce memory costs in python?

I am using multiprocessing to do a bunch of parallel computing, and returning the results as a list. The may issue right now is the function often runs out of memory. My current code is like

def some_func(a)
  b = a**2 (some calculations here)
  return b

with multiprocessing.Pool() as p:
   results_list = p.map(partial(some_func), a_list_of_a)

with open(f"results/final_data.pkl", "wb") as f:
   pickle.dump(results_list, f)

I think one solution might be split the a_list_of_a into several batches, and write the results as batches to the disk to reduce the memory usage. Is it possible?

Edited:

The main issue of my some_func is it returns a large variable. A networkx graph, specifically. The input a_list_a is a list of node numbers. And the return of some_func is graphs generated by given nodes. The length of a_list_of_a is around 15,000. My memory is 64GB but the function is still killed due to out of memory.

Maybe there is a better way to store networkx Graph? I do need to keep the nodes' attribute when storing the Graph object.

Solution

memory measurements

You want to get a much better feel for how memory is being consumed, and being released. The example some_func clearly allocates far less RAM than your "real" function does.

file format

Consider using a more portable data format than pickle(). For example, CSV or JSONL, or an RDBMS, or binary format like Parquet ~~or HDF5~~. The trouble with pickle is, if you rev library versions, it's easy to wind up with unreadable binary files. And even if it is readable, you have to read everything, including large objects you might not need ATM.

the child should serialize results

   results_list = p.map(partial(some_func), a_list_of_a)

Here, you're sending a "large" result object over a pipe to the parent, which allocates storage for it and then sends it to the filesystem. Much better to have each child do the FS I/O work and return a "done!" indicator. Name the files according the child's PID, or perhaps according to an input data hash.

You didn't really describe what "A" is, nor whether some_func takes a microsecond or a minute. There's overhead incurred at each pipeline stage, so consider batching the input tasks into some convenient size. That might correspond to taking between 1 .. 10 seconds of processing time.

map() makes promises you don't need

Consider using .imap_unordered() instead. If you don't care about the order, and don't care when something happens as long as it happens, then give the multiprocessing module some scheduling flexibility; do not unduly constrain it.

It's possible you'll need a post-processing phase to assemble results to put them in some required order, and that's fine. It may well take fewer elapsed minutes to complete all that.