While comparing assembly for std::shared_ptr
vs. boost::shared_ptr
, I noticed that GCC generates a whole lot more code for
void test_copy(const std::shared_ptr<int> &sp) { auto copy = sp; }
(https://godbolt.org/z/efTW6MoEh – more than 70 lines of assembler) than for the boost version, on which GCC's implementation of shared_ptr is based:
void test_copy(const boost::shared_ptr<int> &sp) { auto copy = sp; }
(https://godbolt.org/z/3aoGq1f9P – around 30 lines of assembler).
In particular, I'm puzzled by the following instruction in the std::shared_ptr
version, a mention of which I can't (readily) find in the sources.
movq __gthrw___pthread_key_create(unsigned int*, void (*)(void*))@GOTPCREL(%rip), %rbx
Can someone shed some light as to why std::shared_ptr
generates so much more code than boost::shared_ptr
? Am I missing some magic command line option?
I think this is because GCC's libstdc++ is checking whether the program is actually multithreaded. If it's not, then it can skip the expensive locked instructions to atomically modify the reference counter, and revert to ordinary unlocked instructions. Boost doesn't have this feature and uses the locked instructions unconditionally.
For instance, in the libstdc++ code, you'll notice that if the pointer __gthrw___pthread_key_create
is null, we increment and decrement the reference counter at [rbp+8]
with simple non-atomic instructions (lines 12 and 16-18 of the assembly). But if it's not then we branch to a section where locked add/xadd
are done (lines 52-58).
I haven't really dug into the source code, but I suspect these details are buried in the references to _Lock_policy
.