Search code examples
python-3.xmultiprocessingctypesmmap

multiprocessing.RawArray operation


I read that RawArray can be shared between proceses without being copied, and wanted to understand how it is possible in Python.

I saw in sharedctypes.py, that a RawArray is constructed from a BufferWrapper from heap.py, then nullified with ctypes.memset.

BufferWrapper is made of an Arena object, which itself is built from an mmap (or 100 mmaps in windows, see line 40 in heap.py)

I read that the mmap system call is actually used to allocate memory in Linux/BSD, and the Python module uses MapViewOfFile for windows.

mmap seems handy then. It seems to be able to work directly with mp.pool-

from struct import pack
from mmap import mmap

def pack_into_mmap(idx_nums_tup):

    idx, ints_to_pack = idx_nums_tup
    pack_into(str(len(ints_to_pack)) + 'i', shared_mmap, idx*4*total//2 , *ints_to_pack)


if __name__ == '__main__':

    total = 5 * 10**7
    shared_mmap = mmap(-1, total * 4)
    ints_to_pack = range(total)

    pool = Pool()
    pool.map(pack_into_mmap, enumerate((ints_to_pack[:total//2], ints_to_pack[total//2:])))

My question is -

How does the multirocessing module know not to copy the mmap based RawArray object between processes, like it does with "regular" python objects?


Solution

  • [Python 3.Docs]: multiprocessing - Process-based parallelism serializes / deserializes data exchanged between processes using a proprietary protocol: [Python 3.Docs]: pickle - Python object serialization (and from here the terms: pickle / unpickle).

    According to [Python 3.Docs]: pickle - object.__getstate__():

    Classes can further influence how their instances are pickled; if the class defines the method __getstate__(), it is called and the returned object is pickled as the contents for the instance, instead of the contents of the instance’s dictionary. If the __getstate__() method is absent, the instance’s __dict__ is pickled as usual.

    As seen in (Win variant of) Arena.__getstate__, (class chain: sharedctypes.RawArray -> heap.BufferWrapper - > heap.Heap -> heap.Arena), only the metadata (name and size) are pickled for the Arena instance, but not the buffer itself.

    Conversely, in __setstate__, the buffer is constructed based on the (above) metadata.