Please have a look at the following code:
#include <pthread.h>
#include <boost/atomic.hpp>
class ReferenceCounted {
public:
ReferenceCounted() : ref_count_(1) {}
void reserve() {
ref_count_.fetch_add(1, boost::memory_order_relaxed);
}
void release() {
if (ref_count_.fetch_sub(1, boost::memory_order_release) == 1) {
boost::atomic_thread_fence(boost::memory_order_acquire);
delete this;
}
}
private:
boost::atomic<int> ref_count_;
};
void* Thread1(void* x) {
static_cast<ReferenceCounted*>(x)->release();
return NULL;
}
void* Thread2(void* x) {
static_cast<ReferenceCounted*>(x)->release();
return NULL;
}
int main() {
ReferenceCounted* obj = new ReferenceCounted();
obj->reserve(); // for Thread1
obj->reserve(); // for Thread2
obj->release(); // for the main()
pthread_t t[2];
pthread_create(&t[0], NULL, Thread1, obj);
pthread_create(&t[1], NULL, Thread2, obj);
pthread_join(t[0], NULL);
pthread_join(t[1], NULL);
}
This is somewhat similar to the Reference counting example from Boost.Atomic.
The main differences are that the embedded ref_count_
is initialized to 1
in the constructor (once the constructor is completed we have a single reference to the ReferenceCounted
object) and that the code doesn't use boost::intrusive_ptr
.
Please don't blame me for using delete this
in the code - this is the pattern that I have in a large code base at work and there's nothing I can do about it right now.
Now this code compiled with clang 3.5
from trunk (details below) and ThreadSanitizer (tsan v2) results in the following output from ThreadSanitizer:
WARNING: ThreadSanitizer: data race (pid=9871)
Write of size 1 at 0x7d040000f7f0 by thread T2:
#0 operator delete(void*) <null>:0 (a.out+0x00000004738b)
#1 ReferenceCounted::release() /home/A.Romanek/tmp/tsan/main.cpp:15 (a.out+0x0000000a2c06)
#2 Thread2(void*) /home/A.Romanek/tmp/tsan/main.cpp:29 (a.out+0x0000000a2833)
Previous atomic write of size 4 at 0x7d040000f7f0 by thread T1:
#0 __tsan_atomic32_fetch_sub <null>:0 (a.out+0x0000000896b6)
#1 boost::atomics::detail::base_atomic<int, int, 4u, true>::fetch_sub(int, boost::memory_order) volatile /home/A.Romanek/tmp/boost/boost_1_55_0/boost/atomic/detail/gcc-atomic.hpp:499 (a.out+0x0000000a3329)
#2 ReferenceCounted::release() /home/A.Romanek/tmp/tsan/main.cpp:13 (a.out+0x0000000a2a71)
#3 Thread1(void*) /home/A.Romanek/tmp/tsan/main.cpp:24 (a.out+0x0000000a27d3)
Location is heap block of size 4 at 0x7d040000f7f0 allocated by main thread:
#0 operator new(unsigned long) <null>:0 (a.out+0x000000046e1d)
#1 main /home/A.Romanek/tmp/tsan/main.cpp:34 (a.out+0x0000000a286f)
Thread T2 (tid=9874, running) created by main thread at:
#0 pthread_create <null>:0 (a.out+0x00000004a2d1)
#1 main /home/A.Romanek/tmp/tsan/main.cpp:40 (a.out+0x0000000a294e)
Thread T1 (tid=9873, finished) created by main thread at:
#0 pthread_create <null>:0 (a.out+0x00000004a2d1)
#1 main /home/A.Romanek/tmp/tsan/main.cpp:39 (a.out+0x0000000a2912)
SUMMARY: ThreadSanitizer: data race ??:0 operator delete(void*)
==================
ThreadSanitizer: reported 1 warnings
The strange thing is that thread T1
does a write of size 1 to the same memory location as thread T2
when doing atomic decrement on the reference counter.
How can the former write be explained? Is it some clean-up performed by the destructor of ReferenceCounted
class?
It is a false positive? Or is the code wrong?
My setup is:
$ uname -a
Linux aromanek-laptop 3.13.0-29-generic #53-Ubuntu SMP Wed Jun 4 21:00:20 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
$ clang --version
Ubuntu clang version 3.5-1ubuntu1 (trunk) (based on LLVM 3.5)
Target: x86_64-pc-linux-gnu
Thread model: posix
The code is compiled like this:
clang++ main.cpp -I/home/A.Romanek/tmp/boost/boost_1_55_0 -pthread -fsanitize=thread -O0 -g -ggdb3 -fPIE -pie -fPIC
Note that on my machine the implementation of boost::atomic<T>
resolves to __atomic_load_n
family of functions, which ThreadSanitizer claims to understand.
UPDATE 1: the same happens when using clang 3.4
final release.
UPDATE 2: the same problem occurs with -std=c++11
and <atomic>
with both libstdc++ and libc++.
This looks like a false positive.
The thread_fence
in the release()
method enforces all outstanding writes from fetch_sub
-calls to happen-before the fence returns. Therefore, the delete
on the next line cannot race with the previous writes from decreasing the refcount.
Quoting from the book C++ Concurrency in Action:
A release operation synchronizes-with a fence with an
order
ofstd::memory_order_acquire
[...] if that release operation stores a value that's read by an atomic operation prior to the fence on the same thread as the fence.
Since decreasing the refcount is a read-modify-write operation, this should apply here.
To elaborate, the order of operations that we need to ensure is as follows:
2.
and 3.
are synchronized implicitly, as they happen on the same thread. 1.
and 2.
are synchronized since they are both atomic read-modify-write operations on the same value. If these two could race the whole refcounting would be broken in the first place. So what is left is synchronizing 1.
and 3.
.
This is exactly what the fence does. The write from 1.
is a release
operation that is, as we just discussed, synchronized with 2.
, a read on the same value. 3.
, an acquire
fence on the same thread as 2.
, now synchronizes with the write from 1.
as guaranteed by the spec. This happens without requiring an addition acquire
write on the object (as was suggested by @KerrekSB in the comments), which would also work, but might be potentially less efficient due to the additional write.
Bottom line: Don't play around with memory orderings. Even experts get them wrong and their impact on performance is often negligible. So unless you have proven in a profiling run that they kill your performance and you absolutely positively have to optimize this, just pretend they don't exist and stick with the default memory_order_seq_cst
.