I am investigating how many times a copy happens in an IPC request. This helps me decide the best solution for my mobile device.
For example, shared memory is zero-copy, since processes can directly exchange messages.
Most IPC methods used on Linux (e.g., socket) need two copies: user->kernel->user.
The Binder in Android needs only one copy: from the user (sender) to the kernel space. Then it will utilize mmap()
to avoid a second copy.
As far as I know, the IPC connection in ZeroMQ is based on the UNIX domain socket. So does this mean the copy is unavoidable? If so, how many copies are needed?
It might be 4 or 5:
pipe()
).Step 5 is omitted if you consume it directly e.g. if it's a string and you can use it in place.
The Micro Electronics in the CPU
However, it's worth pausing for a moment and considering what a "copy" is, and what's actually going on down at the microelectronics levels with shared memory.
The model we have of shared memory is that data is stored at some address, and all cores can equally access that data at that address. However, that's not strictly the case. The data has to be copied into a core's L1 cache before it can be processed. So the overall transaction in the microelectronics could be:
Something like this will happen on pretty much any modern CPU, mobile or desktop.
So as you can see, there's actually quite a lot of copying of data going on to access shared memory, even though the programming language's memory model disguises that. Note that quite a lot of this is the same, if the application software is making a copy of data.
Throw in the fact that the inter-core network is kept busy with cache coherency traffic for shared memory, and it makes for a busy chip.
Note that this cache coherency traffic is absent in the ZMQ Actor model, because each process has it's own separate copy of the data and no other process is accessing it / caching it too.
Also it's complicated if the destination thread gets scheduled on the same core as the origin thread, because in that case the data is quite possibly still in the core's L1 cache.
So, it's partly a matter of "luck", application and OS design as to what actually happens.