Search code examples
pythonbuffermarshallingshared-memory

Python create SharedMemory instance using existing buffer (bytes from marshal.dumps())


I would like to create an instance of multiprocessing.shared_memory.SharedMemory passing from outside the buffer to use to hold the data.

My use case is the following:

import marshal

from multiprocessing.shared_memory import SharedMemory


data = {'foo' 1, 'bar': 'some text'}
data_bytes = marshal.dumps(data)
shm = SharedMemory(create=True, size=len(data_bytes))

for i,b in enumerate(data_bytes):
    shm.buf[i] = b

As you can see I need to serialise some data (to later share it across multiple processes). The snippet above uses twice the memory that is actually needed since the serialised data stored in the data_bytes bytes variable needs to be copied inside the SharedMemory buffer (which also takes a considerable amount since in my use case the dimension of data is 1 GB).

The only non-viable solution I have found so far is to guess how much space the serialised data will take, allocate enough space in a SharedMemory instance and have marshal write on it, e.g.

shm = SharedMemory(create=True, size=BIG_ENOUGHT_SIZE)
marshal.dump(data, shm.buf.obj)

However, if my guess is too low, marshal.dump(data, shm.buf.obj) (correctly) throws an error because there is not enough space to write the serialised data.


Solution

  • Passing a buffer object to a SharedMemory instance seems to be impossible at the moment (Python 3.9). The best I have achieved is to use slice assignment to copy the data (which is way faster than using a for loop, if you are using CPython).

    import marshal
    
    from multiprocessing.shared_memory import SharedMemory
    
    
    data = {'foo' 1, 'bar': 'some text'}
    data_bytes = marshal.dumps(data)
    data_bytes_len = len(data_bytes)
    
    shm = SharedMemory(create=True, size=data_bytes_len)
    
    # 'shm' may allocate more memory than the actual requested amount
    # hence we must specify the length of data_bytes in the slice assignment
    shm.buf[:data_bytes_len] = data_bytes