Search code examples
c++concurrencylanguage-lawyermemory-barriersstdatomic

What is the difference between "happens before" and "precedes in a single total order" relations for memory_order_seq_cst operations?


It seems like there are no cases when one can conclude that a memory_order_seq_cst operation A precedes another one B in their single total order and yet neither A happens before B, nor B happens before A.

Thus, because of that and N4700 [atomics.order] 32.3\ 3, I assert that relations "happens-before" and "precedes in a single total order" are equivalent. So all these rules: 32.3\ 3, 4, 5, 6, 7 are redundant. They are already covered by coherence rules in [intro.races] 6.8.2.1\ 14, 15, 16, 17 and fence rules in [atomics.fences] 32.9\ 2, 3, 4.


Solution

  • S total ordering does not imply happens-before

    It's true that the relationship in [intro.races] between the coherence rules and the happens-before ordering, and the relationship in [atomics.order] between the coherency ordering and the S total order, are superficially similar. Abstractly, the principal difference is that the latter is an "if and only if" relationship, while the former is not.

    The [atomics.order] sequential consistency rules says that given two seq_cst operations A,B on the same object X, then B observes A if and only if A is ordered before B in S. The "only if" is what I want to focus on. Suppose A and B are both stores, storing different values a,b respectively. If at some much later time, we observe that X has the value b, then we can conclude that A was ordered before B in S.

    But happens-before is only one way. If A happens before B, then the coherence rules say that B observes A, but the converse is not true. If as above, A and B store two different values to X, then if we much later observe that X == b, we cannot conclude that A happened before B. Not even if A and B were both sequentially consistent. The most we can conclude is that B did not happen before A, and since "happens before" is only a partial order, that's not the same thing.

    There is only one case where observation implies happens-before: when A is a store and B is a load, and both are stronger than relaxed [atomics.order p2]. Other store/load combinations, or weaker orderings, don't do it for you.

    And this is important because happens-before extends to non-seq_cst operations, so it's the one you have to use if you want to reason about them.

    Sticking with our store-store example, consider the following program:

    std::atomic<int> a{0}, b{0};
    int c;
    
    void thr1() {
        a.store(1); // A (Ordering irrelevant)
        b.store(1, std::memory_order_seq_cst); // B1
    }
    
    void thr2() {
        b.store(2, std::memory_order_seq_cst); // B2
        c = a.load(std::memory_order_relaxed); // C (acquire ok too)
    }
    
    int main(void) {
        std::thread t1(thr1), t2(thr2);
        t1.join();
        t2.join();
        if (b == 2)
            assert(c == 1);
    }
    

    If b == 2 is true, then we know that B1 precedes B2 in the S ordering. If this also implied that B1 happens-before B2, then we would have by sequencing and transitivity that A happens-before C, and therefore (by write-read coherence) that C must have loaded the value 1 and the assert cannot fire.

    But this is not correct. We cannot get a happens-before relationship between A and C, because they are in different threads, and there is no release/acquire pair that could impose synchronization between them.

    In fact the assert can fire. Concretely, it is easy to see why: for purposes of ordering with non-seq_cst operations, B2 is merely a release store, and this does not prevent the load C from being reordered before it. Notice that this does not require IRIW reordering or anything fancy like that; it can happen for instance on ARM64 which is other multi-copy atomic.

    You can construct similar examples with a load observing a load, or a store observing a load.

    Happens-before does not imply S total ordering

    This one is more subtle. The standard does say that "strongly happens before" implies S ordering. But when consume operations are in play, "strongly happens before" is not the same as plain "happens before".

    I think that a counterexample is the following.

    std::atomic<int> a{0}, b{0}, c{0};
    
    void thr1() {
        a.store(1, std::memory_order_seq_cst); // AS1
        b.store(1, std::memory_order_release); // BS
    }
    
    void thr2() {
        int tb;
        do {
            tb = b.load(std::memory_order_consume); // BL, the last one
        } while (tb != 1);
        c.store(tb, std::memory_order_seq_cst);  // CS1
    }
    
    int main() {
        std::thread t1(thr1), t2(thr2);
        c.store(2, std::memory_order_seq_cst); // CS2
        a.store(2, std::memory_order_seq_cst); // AS2 
        t1.join();
        t2.join();
        if (c == 2) { // CL
            assert(a == 2); // AL
        }
    }
    

    BL denotes the last iteration of the consume load of b, the one that finally returned 1. That was the value stored by BS, so BS is dependency-ordered before BL. Moreover, BL carries a dependency to CS1, and so it follows that BS is dependency-ordered before CS1, which means that BS inter-thread happens before CS1. Finally, AS1 is sequenced before BS, and so AS1 inter-thread happens before CS1. In particular, AS1 happens before CS1.

    Suppose that CL returns 2. If it were the case that AS1 precedes CS1 in the total ordering, then having CL return 2 implies that CS1 precedes CS2. This in turn would imply that AS1 precedes AS2, and so AL must also return 2, and the assert would not fire.

    But I don't think we have any way to prove that. We can't get AS1 to strongly happen before CS1 because there is no synchronization between threads 1 and 2, only dependency ordering. AS1 is not coherency-ordered with respect to CS1 because they operate on different objects. And we cannot get an S ordering indirectly by transitivity because when CL returns 2, then all remaining sequentially consistent operations in the program (CS2, AS2, CL, AL) are ordered after CS1, so there would be no operation at all that could be ordered in between AS1 and CS1. As such, I don't think there is anything to prevent the assert from firing, in theory.

    I haven't yet been able to think of a mechanism by which this could actually happen in anything resembling real life, but maybe someone else can.

    (Incidentally, the example above shows one of the reasons that consume ordering is hard to get right. In c.store(tb), the variable tb always has the value 1 when this line is reached, yet the compiler must not optimize it into c.store(1) because the data dependency must be preserved.)


    For an example without consume, if I read correctly, we could do the following:

    std::atomic<int> a{0}, b{0};
    
    void thr1() {
        a.store(1, std::memory_order_seq_cst); // AS1
    }
    
    void thr2() {
        while (a.load(std::memory_order_acquire) != 1) { // AL1
            // spin
        }
        b.store(1, std::memory_order_seq_cst); // BS1
    }
    
    int main() {
        std::thread t1(thr1), t2(thr2);
        b.store(2, std::memory_order_seq_cst); // BS2
        a.store(2, std::memory_order_seq_cst); // AS2 
        t1.join();
        t2.join();
        if (b == 2) { // BL
            assert(a == 2); // AL2
        }
    }
    

    Clearly AS1 synchronizes with AL1, and so AS1 happens before BS1.

    But I claim we cannot conclude that AS1 precedes BS1 in S. Assume as before that BL returns 2. As above, since AS1 and BS1 operate on different objects, and there is no other seq_cst operation that could come between them, the only way to get an S ordering is to show that AS1 strongly happens before BS1. The definition of "strongly happens before" offers four ways for this to happen:

    • "AS1 is sequenced before BS1." No, they are in different threads.

    • "AS1 synchronizes with BS1, and both AS1 and BS1 are sequentially consistent atomic operations." No, synchronization requires a store and a load; two stores cannot synchronize.

    • "There are evaluations X and Y such that AS1 is sequenced before X, X simply happens before Y, and Y is sequenced before BS1": No, AS1 isn't sequenced before anything, so X does not exist.

    • "There is an evaluation X such that AS1 strongly happens before X, and X strongly happens before BS1." Maybe. The only candidate for X is AL1. Clearly AL1 does strongly happen before BS1 (by sequencing).

    So does AS1 strongly happen before AL1? Let's repeat the analysis.

    • "AS1 is sequenced before AL1." No, they are in different threads.

    • "AS1 synchronizes with AL1, and both AS1 and AL1 are sequentially consistent atomic operations." No. AS1 does synchronize with AL1, but AL1 is not sequentially consistent.

    • "There are evaluations X and Y such that AS1 is sequenced before X, X simply happens before Y, and Y is sequenced before AL1": No, AS1 isn't sequenced before anything, nor is anything sequenced before AL1.

    • "There is an evaluation X such that AS1 strongly happens before X, and X strongly happens before AL1." No, there is no candidate for X. I suppose you could consider the non-atomic initialization of a to 0 (call it AS0), which does strongly happen before AL1, but it obviously also happens before AS1 and not the other way around. Ditto for BS0.

    Thus, we cannot prove that AS1 precedes BS1 in the S ordering. So even if BS1 precedes BS2 so that BL returns 2, we could still have AS2 precede AS1 without contradiction, in which case AL2 would return 2 and the assert would fire.

    As to how this might actually be implemented, so far the best I can think of is something like POWER where due to the topology, certain cores may snoop each others' store buffers. Relaxed and acquire loads may do this, but sequentially consistent loads may not; they go directly to the coherent L1d cache. Release and sequentially consistent stores are preceded by a barrier which completely drains the store buffer. In that case, it's possible that AL1 observes AS1 by snooping the store buffer, then executes BS1 which commits immediately; whereas AS1 does not commit globally until sometime later.

    If the synchronization between thr1 and thr2 were done by a release-acquire pair in which the release store wasn't AS1 itself but a separate store, then the barrier preceding that second store would ensure that AS1 was globally visible before AL1 and thus before BS1, which is the third case in the "strongly happens before" definition. So I think this model would provide all the C++ ordering semantics.