Search code examples
pythonmultiprocessinggmpy

sharing gmpy2 multi-precision integer between processes without copying


Is it possible to share gmpy2 multiprecision integers (https://pypi.python.org/pypi/gmpy2) between processes (created by multiprocessing) without creating copies in memory? Each integer has about 750,000 bits. The integers are not modified by the processes.

Thank you.


Solution

  • Update: Tested code is below.

    I would try the following untested approach:

    Create a memory mapped file using Python's mmap library.

    Use gmpy2.to_binary() to convert a gmpy2.mpz instance into binary string.

    Write both the length of the binary string and binary string itself into the memory mapped file. To allow for random access, you should begin every write at a multiple of a fixed value, say 94000 in your case.

    Populate the memory mapped file with all your values.

    Then in each process, use gmpy2.from_binary() to read the data from the memory mapped file.

    You need to read both the length of the binary string and binary string itself. You should be able to pass a slice from the memory mapped file directly to gmpy2.from_binary().

    I may be simpler to create a list of (start, end) values for the position of each byte string in the memory mapped file and then pass that list to each process.

    Update: Here is some sample code that has been tested on Linux with Python 3.4.

    import mmap
    import struct
    import multiprocessing as mp
    import gmpy2
    
    # Number of mpz integers to place in the memory buffer.
    z_count = 40000
    # Maximum number of bits in each integer.
    z_bits = 750000
    # Total number of bytes used to store each integer.
    # Size is rounded up to a multiple of 4.
    z_size = 4 + (((z_bits + 31) // 32) * 4)
    
    def f(instance):
        global mm
    
        s = 0
        for i in range(z_count):
            mm.seek(i * z_size)
            t = struct.unpack('i', mm.read(4))[0]
            z = gmpy2.from_binary(mm.read(t))
            s += z
        print(instance, z % 123456789)
    
    def main():
        global mm
    
        mm = mmap.mmap(-1, z_count * z_size)
        rs = gmpy2.random_state(42)
        for i in range(z_count):
            z = gmpy2.mpz_urandomb(rs, z_bits)
            b = gmpy2.to_binary(z)
            mm.seek(i * z_size)
            mm.write(struct.pack('i', len(b)))
            mm.write(b)
    
        ctx = mp.get_context('fork')
        pool = ctx.Pool(4)
        pool.map_async(f, range(4))
        pool.close()
        pool.join()
    
    if __name__ == '__main__':
        main()