c++performance c++11 concurrency memory-model

Performance vs. C++ memory model

With the new shared-memory concurrency features of C++11 it is possible that two threads can allocate memory at the same time. Furthermore, since the compiler does not know in advance if the compiled code will be run by multiple threads at the same time it has to assume for the worst. Thus, my conception would be that the compiled code has to synchronize trips to the heap in some way. This would then slow down single threaded code which does not need synchronization.

Wouldn't this be in contrast to the C++ dictum that "you only pay for what you use"? Is the overhead so small that it was not considered important? Are the other areas where the C++ memory model slows down code which in the end is only used single-threadedly?

Solution

Heap managers indeed need to synchronize, and that's a possible performance problem for multi-threaded code. It's up to the program to mitigate that if necessary. Standard libraries are also reacting, trying to get in better multi-threaded allocators.

Edit: Some thoughts about the questions in the second paragraph.

Even C++ needs to be sufficiently safe to be usable. "YDPFWYU" is nice, but if it means that you have to wrap a mutex around every allocation if you want to use the code in a multi-threaded environment, you have a big problem. It's like exceptions, really: even code that doesn't actively use them should be somewhat aware that it might be used in a context where they exist, and both the programmer and the compiler need to be aware of that. The compiler needs to create exception support code/data structures, while the programmer needs to write exception-safe code. Multi-threading is the same, only worse: any piece of code you write might be used in a multi-threaded environment, so you need to write thread-safe code, and the compiler/environment needs to be aware of threading (forgo some very unsafe optimizations, and have a thread-safe allocator).

These are the points in C++ where you pay even for what you don't use, as far as the standard is concerned. Your particular compiler might give you an escape hatch (disable exceptions, use single-threaded runtime library), but that's no longer real C++ then.

That said, even (or especially) if you have a single global allocator lock, the overhead for a single-threaded program is minimal: locks are only expensive when under contention. An uncontested mutex lock/unlock is not very significant compared to the rest of the allocator operation.

Under contention, the story is different, which is where custom allocators possibly come in.

As I briefly mentioned above, one other place in C++ is slowed down very slightly by the mere existence of multi-threading: the prohibition on some particular optimizations. The compiler cannot invent reads and writes (especially writes) to possibly shared variables (like globals or things you handed out a pointer to) in code paths that wouldn't ordinarily have these accesses. This may slow down very specific pieces of code, but overall in a program, it's very unlikely that you'll notice.