Search code examples
c11memory-modelmemory-fences

Fences with non-atomics in C11


Is there any way to use fences to reason about the behavior of non-atomic operations in C11? Specifically, I'd like to make code safe in situations where certain fields are required to be ints for compatibility with old interfaces that might, say, read and write data structures to files or pass them as system call arguments. As there's no requirement that an atomic_int even be the same size as an int, I can't use an atomic_int.

Here's a minimal working example that unfortunately produces undefined behavior according to section 5.1.2.4 paragraph 25, because of the data race on ready:

#include <stdatomic.h>
#include <stdio.h>
#include <threads.h>

int ready;  /* purposely NOT _Atomic */
int value;

void
p1()
{
  value = 1;
  atomic_thread_fence(memory_order_release);
  ready = 1;
}

void
p2(void *_ignored)
{
  while (!ready)
    ;
  atomic_thread_fence(memory_order_acquire);
  printf("%d\n", value);
}

int
main()
{
  thrd_t t;
  thrd_create(&t, p2, NULL);
  p1();
  thrd_join(&t, NULL);
}

My specific question is whether it's possible to fix the above code to guarantee printing 1 without changing ready to an _Atomic. (I could make ready a volatile, but don't see any suggestion in the spec that this would help.)

A related question is whether it's safe to write the above code anyway, because any machine my code will run on has cache coherence? I'm aware that many things go wrong when C11 programs contain so-called benign races, so I'm really looking for the specifics of what a plausible compiler and architecture could do to the above code rather than general warnings about data races and undefined behavior.


Solution

  • Is there any way to use fences to reason about the behavior of non-atomic operations in C11?

    The way you use fences is correct but if you want to be able to reason about program behavior, it is your responsibility to ensure that there is a strict inter-thread modification order between the store(1) to ready and the load(1) from it. This is normally where an atomic variable comes into play. Per the C11 standard you have a data race on ready (as you pointed out) and undefined behavior is what you can expect.

    My specific question is whether it's possible to fix the above code to guarantee printing 1 without changing ready to an _Atomic. (I could make ready a volatile, but don't see any suggestion in the spec that this would help.)

    The standards conforming answer is 'no' and since the standard does not support your case you won't find anything related to volatile in this context.

    However, the standard is strict on purpose considering that one of the goals is to support compatibility with many architectures. That does not mean that a data race will always lead to problems on each platform.

    The issues with using non-atomic types in a shared context are tricky though. People sometimes believe that if CPU operations on a type such as int are indivisible, it can be used as a substitute for atomic_int. This is not true because 'atomic' is a concept with wider ramifications:

    • indivisible read/writes - These apply to regular types on many platforms.

    • restricted optimizations - Compiler transformations can truly cause undefined behavior in many unexpected ways. A compiler may re-order memory operations, combine a variable with another in the same memory location, remove a variable from a loop, keep it in a register, etc... You can prevent much of this by declaring your variable volatile as it puts restrictions on what the compiler can do wrt optimizations.

    • data synchronization between cores - in your case, this is handled by the fences under the condition that there is a strict inter-thread ordering on ready between the store and the load. With a real atomic_int, you could have used relaxed operations.

    Whether your code works, depends on platform and compiler, but at least declare the ready flag volatile. I did a test run on X86_64 with gcc -O3 compiler optimization and without volatile it was caught in an endless loop.
    It is also a good idea to compare the difference between compiler emitted instructions for the atomic and non-atomic case.

    A related question is whether it's safe to write the above code anyway, because any machine my code will run on has cache coherence?

    You definitely want cache-coherency because systems that do not support it are notoriously hard to program. The way you wrote it will almost certainly not work without cache-coherency.