Search code examples
cc11memory-modelmemory-barriersmemory-fences

Possible to use C11 fences to reason about writes from other threads?


Adve and Gharachorloo's report, in Figure 4b, provides the following example of a program that exhibits unexpected behavior in the absence of sequential consistency:

enter image description here

My question is whether it is possible, using only C11 fences and memory_order_relaxed loads and stores, to ensure that register1, if written, will be written with the value 1. The reason this might be hard to guarantee in the abstract is that P1, P2, and P3 could be at different points in a pathological NUMA network with the property that P2 sees P1's write before P3 does, yet somehow P3 sees P2's write very quickly. The reason this might be hard to guarantee with respect to the C11 spec specifically is that P1's write to A and P2's read of A do not synchronize with each other, and therefore by paragraph 5.1.2.4.26 of the spec will result in undefined behavior. Possibly I can sidestep the undefined behavior through relaxed atomic fetch/store, but I still don't know how to reason transitively about the order seen by P3.

Below is a MWE attempting to solve the problem with fences, but I'm not sure if it is correct. I'm specifically worried that the release fence is not good enough, because it won't flush p1's store buffer, just p2's. However, it will answer my question if you can argue the assert will never fail just based on the C11 standard (as opposed to some other information one might have about a particular compiler and architecture).

#include <assert.h>
#include <stdatomic.h>
#include <stddef.h>
#include <threads.h>

atomic_int a = ATOMIC_VAR_INIT(0);
atomic_int b = ATOMIC_VAR_INIT(0);

void
p1(void *_ignored)
{
  atomic_store_explicit(&a, 1, memory_order_relaxed);
}

void
p2(void *_ignored)
{
  if (atomic_load_explicit(&a, memory_order_relaxed)) {
    atomic_thread_fence(memory_order_release); // not good enough?
    atomic_store_explicit(&b, 1, memory_order_relaxed);
  }
}

void
p3(void *_ignored)
{
  int register1 = 1;
  if (atomic_load_explicit(&b, memory_order_relaxed)) {
    atomic_thread_fence(memory_order_acquire);
    register1 = atomic_load_explicit(&a, memory_order_relaxed);
  }
  assert(register1 != 0);
}

int
main()
{
  thrd_t t1, t2, t2;
  thrd_create(&t1, p1, NULL);
  thrd_create(&t2, p2, NULL);
  thrd_create(&t3, p3, NULL);
  thrd_join(&t1, NULL);
  thrd_join(&t2, NULL);
  thrd_join(&t3, NULL);
}

Solution

  • You forget memory_order_acquire fence in p3:

    void
    p3(void *_ignored)
    {
      int register1 = 1;
      if (atomic_load_explicit(&b, memory_order_relaxed)) {
        atomic_thread_fence(memory_order_acquire); // <-- Here
        register1 = atomic_load_explicit(&a, memory_order_relaxed);
      }
      assert(register1 != 0);
    }
    

    With this fence, loading a in p2 will be in happens-before relation with loading a in p3.

    C11 standard garantees read-read coherence, which means that the loading in p3 should observe same-or-subsequent modification, which is observed by the happened-before loading in p2. Because the loading in p2 observes the store in p1, and no subsequent modifications of a is possible in your scenario, loading in p3 should also observe storing in p1.

    So your assertion can never trigger.


    References to the corresponded statements in the standard:

    5.1.2.4 p.25: The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior.

    So, atomic accesses cannot contain data race by definition.

    5.1.2.4 p.22: ... if a value computation A of an atomic object M happens before a value computation B of M, and the value computed by A corresponds to the value stored by side effect X, then the value computed by B shall either equal the value computed by A, or be the value stored by side effect Y, where Y follows X in the modification order of M.

    Next paragraph says, that this is cache coherence garantee. C++11 standard is more specific, and says about read-read cache coherence in similar wording.