Search code examples
python-multiprocessingctypes

How to get python's multiprocessing array'pointer and pass it to Cpp program?


I now need to request arrays in python, and pass them into a Cpp program, yet python still needs to process them. But I found that when I use multiprocessing, the array's adderess would be changed.

Below are my codes:

//export dll in test.h, this is test.cpp
#include "test.h"
#include <iostream>
using namespace std;

void someWork(double* data, long* flag){
    cout << "cpp flag address:" << flag << endl;
    //do some work
}
# test.py
import multiprocessing as mp
from multiprocessing.sharedctypes import RawArray
import ctypes

data = RawArray(ctypes.c_double, 2000)
flag = RawArray(ctypes.c_long, 20)
pkg = ctypes.cdll.LoadLibrary(r"test.dll")
pkg.someWork.argtypes = [
    ctypes.POINTER(ctypes.c_double * 2000),# dataArray
    ctypes.POINTER(ctypes.c_long * 20)#flagArray
]

def proc_py():
    idx = 0
    while True:
        if flag[idx] == 1:
            # do something
            flag[idx] = 0
            idx = (idx + 1) % 20

def proc_cpp():
    pkg.someWork(ctypes.pointer(data), ctypes.pointer(flag))

def main():
    p_cpp = mp.Process(target=proc_cpp, args=())
    p_py = mp.Process(target=proc_py, args=())
    p_cpp .start()
    p_py .start()
    p_cpp .join()
    p_py .join()

if __name__ == '__main__':
    print("py flag address:", ctypes.byref(flag))
    # proc_cpp()
    main()

The result is: when I just run proc_cpp in python, the address are the same:

py flag address: <cparam 'P' (0000019DA8282400)>
cpp flag address:   0000019DA8282400

But when I run main, the address are different:

py flag address: <cparam 'P' (000001CB42A32400)>
cpp flag address:   0000012F1E152400

I know that python's multiprocessing must use shared memory to share memory between processings, but I failed both by using mp.Array/Array.get_obj() and mp.sharedctypes.RawArray/ctypes.pointer(). Is there any way to solve my question?


Solution

  • Do not create the RawArray outside of the "run once" main code or you are making different arrays. Create the RawArray once in the main process, and pass that RawArray as a parameter to the target function of the new processes. The virtual address each process "sees" will be different, but the physical memory will be the same.

    Here's an example:

    test.cpp:

    This will display the pointer address and then change the specified index in the shared array.

    #include <iostream>
    #include <cstdint>
    using namespace std;
    
    #define API __declspec(dllexport)
    
    extern "C" API
    void set(double* data, int index, double value) {
        cout << data << ' ' << index << ' ' << value << endl;
        data[index] = value;
    }
    

    test.py:

    This passes the shared array to each process. The main process will also change an element. The lock is used because RawArray is not synchronized and the printing in the C++ code will mess up otherwise, so this code won't really run in parallel but it does illustrate that the processes get different virtual addresses but share the same data.

    import multiprocessing as mp
    from multiprocessing.sharedctypes import RawArray
    from ctypes import *
    
    dll = CDLL('./test')
    dll.set.argtypes = POINTER(c_double),c_int,c_double
    dll.set.restype = None
    
    def call(lock,data,index,value):
        with lock:
            dll.set(data,index,value)
    
    if __name__ == '__main__':
    
        # This code runs once in the main process.
        # The lock and shared data are created once only and passed to other processes.
    
        lock = mp.Lock()
        data = RawArray(c_double, 3)
        data[0] = 0.5
        p1 = mp.Process(target=call, args=(lock,data,1,1.25))
        p2 = mp.Process(target=call, args=(lock,data,2,2.5))
        p1.start()
        p2.start()
        p1.join()
        p2.join()
        print(list(data))
    

    Output (different addresses, same shared data):

    00000269D66E0000 1 1.25
    00000187F0B90000 2 2.5
    [0.5, 1.25, 2.5]