The following code (minimized from a large project) causes a EXC_BAD_INSTRUCTION crash when built with XCode 7.3.1, Boost 1.61 for iOS:
main.mm:
#include "stdio.h"
#include "boost/lockfree/queue.hpp"
int main(int argc, char * argv[]) {
printf("Test1 in\n");
boost::lockfree::queue<int*> q(100);
printf("Test1 out\n");
return 0;
}
The stacktrace seems to tell me, that the problem comes from a c++ atomic operation:
#0 0x0000000100047a78 in std::__1::__atomic_base<boost::lockfree::detail::tagged_ptr<boost::lockfree::detail::freelist_stack<boost::lockfree::queue<int*, boost::parameter::void_, boost::parameter::void_, boost::parameter::void_>::node, std::__1::allocator<boost::lockfree::queue<int*, boost::parameter::void_, boost::parameter::void_, boost::parameter::void_>::node> >::freelist_node>, false>::store(boost::lockfree::detail::tagged_ptr<boost::lockfree::detail::freelist_stack<boost::lockfree::queue<int*, boost::parameter::void_, boost::parameter::void_, boost::parameter::void_>::node, std::__1::allocator<boost::lockfree::queue<int*, boost::parameter::void_, boost::parameter::void_, boost::parameter::void_>::node> >::freelist_node>, std::__1::memory_order) [inlined] at /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/atomic:842
#1 0x0000000100047a74 in boost::lockfree::detail::freelist_stack<boost::lockfree::queue<int*, boost::parameter::void_, boost::parameter::void_, boost::parameter::void_>::node, std::__1::allocator<boost::lockfree::queue<int*, boost::parameter::void_, boost::parameter::void_, boost::parameter::void_>::node> >::deallocate_impl_unsafe(boost::lockfree::queue<int*, boost::parameter::void_, boost::parameter::void_, boost::parameter::void_>::node*) at /Users/deinzer/src/pipeline.ios/boost/boost/lockfree/detail/freelist.hpp:251
#2 0x00000001000479e8 in boost::lockfree::detail::freelist_stack<boost::lockfree::queue<int*, boost::parameter::void_, boost::parameter::void_, boost::parameter::void_>::node, std::__1::allocator<boost::lockfree::queue<int*, boost::parameter::void_, boost::parameter::void_, boost::parameter::void_>::node> >::freelist_stack<std::__1::allocator<boost::lockfree::queue<int*, boost::parameter::void_, boost::parameter::void_, boost::parameter::void_>::node> >(std::__1::allocator<boost::lockfree::queue<int*, boost::parameter::void_, boost::parameter::void_, boost::parameter::void_>::node> const&, unsigned long) at /Users/deinzer/src/pipeline.ios/boost/boost/lockfree/detail/freelist.hpp:64
#3 0x00000001000478e0 in boost::lockfree::queue<int*, boost::parameter::void_, boost::parameter::void_, boost::parameter::void_>::queue(unsigned long) at /Users/deinzer/src/pipeline.ios/boost/boost/lockfree/queue.hpp:205
#4 0x0000000100047840 in main at /Users/deinzer/src/iostester/lockfreecrash/lockfree_crash/lockfree_crash/main.mm:7
#5 0x00000001821d68b8 in start ()
The disassembly output shows the illegal opcode:
0x100047a5c <+72>: mov x20, x0
0x100047a60 <+76>: add x0, sp, #16 ; =16
0x100047a64 <+80>: bl 0x100047aac ; boost::lockfree::detail::tagged_ptr<boost::lockfree::detail::freelist_stack<boost::lockfree::queue<int*, boost::parameter::void_, boost::parameter::void_, boost::parameter::void_>::node, std::__1::allocator<boost::lockfree::queue<int*, boost::parameter::void_, boost::parameter::void_, boost::parameter::void_>::node> >::freelist_node>::get_ptr at tagged_ptr_dcas.hpp:78
0x100047a68 <+84>: mov x1, x0
0x100047a6c <+88>: mov x0, x20
0x100047a70 <+92>: bl 0x100047aa4 ; boost::lockfree::detail::tagged_ptr<boost::lockfree::detail::freelist_stack<boost::lockfree::queue<int*, boost::parameter::void_, boost::parameter::void_, boost::parameter::void_>::node, std::__1::allocator<boost::lockfree::queue<int*, boost::parameter::void_, boost::parameter::void_, boost::parameter::void_>::node> >::freelist_node>::set_ptr at tagged_ptr_dcas.hpp:83
0x100047a74 <+96>: ldp x9, x8, [sp]
-> 0x100047a78 <+100>: .long 0xc87f7e7f ; unknown opcode
0x100047a7c <+104>: stxp w10, x9, x8, [x19]
0x100047a80 <+108>: cbnz w10, 0x100047a78 ; <+100> [inlined] std::__1::__atomic_base<boost::lockfree::detail::tagged_ptr<boost::lockfree::detail::freelist_stack<boost::lockfree::queue<int*, boost::parameter::void_, boost::parameter::void_, boost::parameter::void_>::node, std::__1::allocator<boost::lockfree::queue<int*, boost::parameter::void_, boost::parameter::void_, boost::parameter::void_>::node> >::freelist_node>, false>::store(boost::lockfree::detail::tagged_ptr<boost::lockfree::detail::freelist_stack<boost::lockfree::queue<int*, boost::parameter::void_, boost::parameter::void_, boost::parameter::void_>::node, std::__1::allocator<boost::lockfree::queue<int*, boost::parameter::void_, boost::parameter::void_, boost::parameter::void_>::node> >::freelist_node>, std::__1::memory_order) + 4 at freelist.hpp:251
The problem only occurs, when
The code runs fine, if
I would like to know the root cause of the problem. Is it a clang issue, a std::atomic or boost bug? What can be done to avoid this problem?
The instruction your disassembler refuses to acknowledge is ldxp xzr, xzr, [x19]
- in other words, a load to prime the exclusive monitor so that the store-exclusive will succeed (or fail and restart if there really was some concurrent memory access, so the atomicity of the store can be guaranteed). Since that's the only aspect that matters in this case, i.e. we don't care about the actual data loaded, it's cheekily using the zero register as the target to simply discard the data and avoid having to allocate scratch registers to load into.
The problem here is that using the same register for both targets of a load pair is architecturally unpredictable. This must be a bug in either Boost or Clang, depending on whether the offending instruction has come from some explicit assembly code or a compiler-internal implementation. From unpicking those templates, I think it's in std::atomic, but as my knowledge stops after C++98 I'm not really sure where that points the finger.
To quote from the ldxp
section of the ARMv8 ARM unpredictable behaviour appendix:
If
t == t2
, then one of the following behaviors must occur:
- The instruction is UNDEFINED.
- The instruction executes as a
NOP
.- The instruction performs a load using the specified addressing mode, and the base register [sic] is set to an UNKNOWN value.
It's quite possible that on some CPUs where the designers went for the third option, this code would end up working as expected (and indeed could have been tested to do so). Apple's CPU designers, however, would seem to have taken the first option on at least whichever of their cores is in the device in question, hence bang.
What the __atomic_base::store()
implementation should be able to do to fix it neatly is simply reuse the scratch register allocated for the store-exclusive status instead of one of the xzr
s, e.g. ldxp xzr, x10, [x19]
for this example. That should make the instruction well-defined without affecting any other code (the following stxp
will always overwrite the whole register unconditionally), and without requiring the optimiser to allocate additional registers. One could conceivably write a tool to post-process the compiled binary, scanning for the relevant instruction pairs and fixing up the load operands thusly, but it's probably more sensible to just file the appropriate bug report and get it fixed at the source - as it turns out, the underlying optimiser problem I suspected has been reported against upstream LLVM already.