Does inline asm access to shared variables count as a data race in C++11?

#include <thread>
int i;
int main()
{
    std::thread t1([&i]() {
        asm volatile("movl $1, %0"
                     : "=m"(i));
    });
    std::thread t2([&i]() {
        asm volatile("movl $2, %0"
                     : "=m"(i));
    });
    t2.join();
    t1.join();
    return 0;
}

Is it a data race ( and in result UB) in C++? Let's assume that address of i is aligned and that on our CPU load/store of aligned double words are atomic.

Probably, it satisfies a defintion of data race.

Solution

The standard leaves inline asm implementation-defined, for obvious reasons. To say anything about this, we have to make stuff up and hand-wave about things.

More importantly, we're no longer considering pure ISO C++, but rather the GNU dialect of C++, which defines a lot of behaviour that ISO C++ leaves Undefined. e.g. the gcc manual says type-punning by writing one union member and reading another is well-defined in GNU C++, even though it's UB in ISO C++. A lot of things are still UB in GNU C++, and "whatever g++ actually does" doesn't count as a definition. See the manual's implementation-defined behaviour section in the table of contents.

The C++ -> asm stage of gcc doesn't even understand the instructions in an inline-asm statement, it just fills in the operands and passes it on to the assembler. It doesn't "think about" what the instructions are doing; it treats it as a black box described by the output, input, and clobber constraints.

Since you used an "=m"(i) output operand, your asm statement does interact with the C++ variable in a C++ish way (rather than behind the compiler's back with asm("mov $1, i");). I think the compiler sees it as something like i = __builtin_my_asm_statement();. Plus the volatile keyword that prevents compile-time reordering / hoisting / dead-code-elimination.

Not every data race is a C++-standard Undefined-Behaviour causing Data Race. For example, if i was a std::atomic type, the final value of i is still indeterminate because of the race condition. (The program would be free of C++ UB, though, and i would be either 2 or 1, with no tearing. Undefined Behaviour technically means anything could happen, of course.)

So, what can we say about this code:

We can assume that i is naturally aligned, because all the usual x86 ABIs guarantee that.
We know that the asm will include a store to memory as a single instruction, not copying one byte at a time. (Which no sane compiler would do anyway).
I'm not 100% sure that we can guarantee that the store goes directly to the shared value of i, rather than to scratch space on the stack which the compiler will then copy from. An evil compiler that did this would break a lot of code, including anything like the Linux kernel that uses inline asm to run locked instructions on shared variables using memory operands.

So if we can assume a non-malicious compiler, we can be pretty sure what the compiler output will do. Or perhaps the behaviour of an "=m" operand on a shared value should be considered well-defined behaviour in GNU C, so we can say that this is well-defined.