In C++, can atomics suffer spurious stores?
For example, suppose that m
and n
are atomics and that m = 5
initially. In thread 1,
m += 2;
In thread 2,
n = m;
Result: the final value of n
should be either 5 or 7, right? But could it spuriously be 6? Could it spuriously be 4 or 8, or even something else?
In other words, does the C++ memory model forbid thread 1 from behaving as though it did this?
++m;
++m;
Or, more weirdly, as though it did this?
tmp = m;
m = 4;
tmp += 2;
m = tmp;
Reference: H.-J. Boehm & S. V. Adve, 2008, Figure 1. (If you follow the link, then, in the paper's section 1, see the first bulleted item: "The informal specifications provided by ...")
THE QUESTION IN ALTERNATE FORM
One answer (appreciated) shows that the question above can be misunderstood. If helpful, then here is the question in alternate form.
Suppose that the programmer tried to tell thread 1 to skip the operation:
bool a = false;
if (a) m += 2;
Does the C++ memory model forbid thread 1 from behaving, at run time, as though it did this?
m += 2; // speculatively alter m
m -= 2; // oops, should not have altered! reverse the alteration
I ask because Boehm and Adve, earlier linked, seem to explain that a multithreaded execution can
COMPILABLE SAMPLE CODE
Here is some code you can actually compile, if you wish.
#include <iostream>
#include <atomic>
#include <thread>
// For the orignial question, do_alter = true.
// For the question in alternate form, do_alter = false.
constexpr bool do_alter = true;
void f1(std::atomic_int *const p, const bool do_alter_)
{
if (do_alter_) p->fetch_add(2, std::memory_order_relaxed);
}
void f2(const std::atomic_int *const p, std::atomic_int *const q)
{
q->store(
p->load(std::memory_order_relaxed),
std::memory_order_relaxed
);
}
int main()
{
std::atomic_int m(5);
std::atomic_int n(0);
std::thread t1(f1, &m, do_alter);
std::thread t2(f2, &m, &n);
t2.join();
t1.join();
std::cout << n << "\n";
return 0;
}
This code always prints 5
or 7
when I run it. (In fact, as far as I can tell, it always prints 7
when I run it.) However, I see nothing in the semantics that would prevent it from printing 6
, 4
or 8
.
The excellent Cppreference.com states, "Atomic objects are free of data races," which is nice, but in such a context as this, what does it mean?
Undoubtedly, all this means that I do not understand the semantics very well. Any illumination you can shed on the question would be appreciated.
ANSWERS
@Christophe, @ZalmanStern and @BenVoigt each illuminate the question with skill. Their answers cooperate rather than compete. In my opinion, readers should heed all three answers: @Christophe first; @ZalmanStern second; and @BenVoigt last to sum up.
The existing answers provide a lot of good explanation, but they fail to give a direct answer to your question. Here we go:
can atomics suffer spurious stores?
Only volatile
is actually prohibited from performing extra memory accesses.
does the C++ memory model forbid thread 1 from behaving as though it did this?
++m; ++m;
Yes, but this one is allowed:
lock (shared_std_atomic_secret_lock) { ++m; ++m; }
It's allowed but stupid. A more realistic possibility is turning this:
std::atomic<int64_t> m;
++m;
into
memory_bus_lock
{
++m.low;
if (last_operation_did_carry)
++m.high;
}
where memory_bus_lock
and last_operation_did_carry
are features of the hardware platform that can't be expressed in portable C++.
Note that peripherals sitting on the memory bus do see the intermediate value, but can interpret this situation correctly by looking at the memory bus lock. Software debuggers won't be able to see the intermediate value.
In other cases, atomic operations can be implemented by software locks, in which case:
memcpy
to read the atomic object) they can observe intermediate values. Formally, that's undefined behavior.One last important point. The "speculative write" is a very complex scenario. It's easier to see this if we rename the condition:
Thread #1
if (my_mutex.is_held) o += 2; // o is an ordinary variable, not atomic or volatile
return o;
Thread #2
{
scoped_lock l(my_mutex);
return o;
}
There's no data race here. If Thread #1 has the mutex locked, the write and read can't occur unordered. If it doesn't have the mutex locked, the threads run unordered but both are performing only reads.
Therefore the compiler cannot allow intermediate values to be seen. This C++ code is not a correct rewrite:
o += 2;
if (!my_mutex.is_held) o -= 2;
because the compiler invented a data race. However, if the hardware platform provides a mechanism for race-free speculative writes (Itanium perhaps?), the compiler can use it. So hardware might see intermediate values, even though C++ code cannot.
If intermediate values shouldn't be seen by hardware, you need to use volatile
(possibly in addition to atomics, because volatile
read-modify-write is not guaranteed atomic). With volatile
, asking for an operation which can't be performed as-written will result in compilation failure, not spurious memory access.