Search code examples
c++python-3.xshared-memoryboost-interprocess

Fast communication between C++ and python using shared memory


In a cross platform (Linux and windows) real-time application, I need the fastest way to share data between a C++ process and a python application that I both manage. I currently use sockets but it's too slow when using high-bandwith data (4K images at 30 fps).

I would ultimately want to use the multiprocessing shared memory but my first tries suggest it does not work. I create the shared memory in C++ using Boost.Interprocess and try to read it in python like this:

#include <boost/interprocess/shared_memory_object.hpp>
#include <boost/interprocess/mapped_region.hpp>

int main(int argc, char* argv[])
{
    using namespace boost::interprocess;

    //Remove shared memory on construction and destruction
    struct shm_remove
    {
        shm_remove() { shared_memory_object::remove("myshm"); }
        ~shm_remove() { shared_memory_object::remove("myshm"); }
    } remover;

    //Create a shared memory object.
    shared_memory_object shm(create_only, "myshm", read_write);

    //Set size
    shm.truncate(1000);

    //Map the whole shared memory in this process
    mapped_region region(shm, read_write);

    //Write all the memory to 1
    std::memset(region.get_address(), 1, region.get_size());

    std::system("pause");
}

And my python code:

from multiprocessing import shared_memory

if __name__ == "__main__":
    shm_a = shared_memory.SharedMemory(name="myshm", create=False)
    buffer = shm_a.buf
    print(buffer[0])

I get a system error FileNotFoundError: [WinError 2] : File not found. So I guess it only works internally in Python multiprocessing, right ? Python seems not to find the shared memory created on C++ side.

Another possibility would be to use mmap but I'm afraid that's not as fast as "pure" shared memory (without using the filesystem). As stated by the Boost.interprocess documentation:

However, as the operating system has to synchronize the file contents with the memory contents, memory-mapped files are not as fast as shared memory

I don't know to what extent it is slower however. I just would prefer the fastest solution as this is the bottleneck of my application for now.


Solution

  • So I spent the last days implementing shared memory using mmap, and the results are quite good in my opinion. Here are the benchmarks results comparing my two implementations: pure TCP and mix of TCP and shared memory.

    Protocol:

    Benchmark consists of moving data from C++ to Python world (using python's numpy.nparray), then data sent back to C++ process. No further processing is involved, only serialization, deserialization and inter-process communication (IPC).

    Case A:

    Communication is done with TCP {header + data}.

    Case B:

    • One C++ process implementing TCP communication using Boost.Asio and shared memory (mmap) using Boost.Interprocess
    • One Python3 process using standard TCP sockets and mmap

    Communication is hybrid : synchronization is done through sockets (only header is passed) and data is moved through shared memory. I think this design is great because I have suffered in the past from problem of synchronization using condition variable in shared memory, and TCP is easy to use in both C++ and Python environments.

    Results:

    Small data at high frequency

    200 MBytes/s total: 10 MByte sample at 20 samples per second

    Case Global CPU consumption C++ part python part
    A 17.5 % 10% 7.5%
    B 6% 1% 5%

    Big data at low frequency

    200 MBytes/s total: 0.2 MByte sample at 1000 samples per second

    Case Global CPU consumption C++ part python part
    A 13.5 % 6.7% 6.8%
    B 11% 5.5% 5.5%

    Max bandwidth

    • A : 250 MBytes / second
    • B : 600 MBytes / second

    Conclusion:

    In my application, using mmap has a huge impact on big data at average frequency (almost 300 % performance gain). When using very high frequencies and small data, the benefit of shared memory is still there but not that impressive (only 20% improvement). Maximum throughput is more than 2 times bigger.

    Using mmap is a good upgrade for me. I just wanted to share my results here.