Search code examples
multithreadingrustatomicmemory-model

Rust SeqCst ordering


I'm trying to understand how Ordering::SeqCst works. For that I have few code examples where this ordering is mandatory for obtaining consistent result. In first example we just want to increment counter variable:

let a: &'static _ = Box::leak(Box::new(AtomicBool::new(false)));
let b: &'static _ = Box::leak(Box::new(AtomicBool::new(false)));
let counter: &'static _ = Box::leak(Box::new(AtomicUsize::new(0)));

let _thread_a = spawn(move || a.store(true, Ordering::Release));
let _thread_b = spawn(move || b.store(true, Ordering::Release));

let thread_1 = spawn(move || {
    while !a.load(Ordering::Acquire) {} // prevents from reordering everything after

    if b.load(Ordering::Relaxed) { // no need of Acquire due to previous restriction
        counter.fetch_add(1, Ordering::Relaxed);
    }
});
let thread_2 = spawn(move || {
    while !b.load(Ordering::Acquire) {} // prevents from reordering everything after

    if a.load(Ordering::Relaxed) { // no need of Acquire due to previous restriction
        counter.fetch_add(1, Ordering::Relaxed);
    }
});

thread_1.join().unwrap();
thread_2.join().unwrap();
println!("{}", counter.load(Ordering::Relaxed));

Possible values of counter with this example are 1 or 2, depends on thread scheduling. But surprisingly 0 is also possible but I don't understand how.

If thread_1 has started and only a was set to true by _thread_a, counter could will be left untouched after thread_1 will exit.

If thread_2 will start after thread_1, counter will be incremented once, bcs thread_1 has finished (here we know that a is already true), so thread_2 have just to wait for b to become true.

Or if thread_2 will be first and b was set to true, counter will be incremented only once too.

There is also possibility that _thread_a and _thread_b will both run before thread_1 and thread_2 and both of them will increment counter. So that's why 1 and 2 are valid possible outcomes for counter. But as I previously said, there is also a 0 as possible valid result, only if I won't replace all loads and stores for a and b to Ordering::SeqCst:

let _thread_a = spawn(move || a.store(true, Ordering::SeqCst));
let _thread_b = spawn(move || b.store(true, Ordering::SeqCst));

let thread_1 = spawn(move || {
    while !a.load(Ordering::SeqCst) {}

    if b.load(ordering::SeqCst) {
        counter.fetch_add(1, Ordering::Relaxed);
    }
});
let thread_2 = spawn(move || {
    while !b.load(Ordering::SeqCst) {}

    if a.load(ordering::SeqCst) {
        counter.fetch_add(1, Ordering::Relaxed);
    }
});

thread_1.join().unwrap();
thread_2.join().unwrap();
println!("{}", counter.load(Ordering::SeqCst));

Now 0 isn't possible, but I don't know why.

Second example was taken from here:

static A: AtomicBool = AtomicBool::new(false);
static B: AtomicBool = AtomicBool::new(false);

static mut S: String = String::new();

fn main() {
    let a = thread::spawn(|| {
        A.store(true, SeqCst);
        if !B.load(SeqCst) {
            unsafe { S.push('!') };
        }
    });

    let b = thread::spawn(|| {
        B.store(true, SeqCst);
        if !A.load(SeqCst) {
            unsafe { S.push('!') };
        }
    });

    a.join().unwrap();
    b.join().unwrap();
}

Threads a and b could start at same time and modify A and B thus none of them will modify S. Or one of them could start before the other, and modify S, leaving other thread with unmodified S. If I understood correctly, there is no possibility for S to being modified in parallel by both threads? The only reason why Ordering::SeqCst is useful here, to prevent from reordering. But if I will replace all ordering like this:

let a = thread::spawn(|| {
    A.store(true, Release); // nothing can be placed before
    if !B.load(Acquire) { // nothing can be reordered after
        unsafe { S.push('!') };
    }
});
    
let b = thread::spawn(|| {
    B.store(true, Release); // nothing can be placed before
    if !A.load(Acquire) { // nothing can be reordered after
        unsafe { S.push('!') };
    }
});

Isn't it the same as original?

Also Rust docs refers to C++ docs on ordering, where Ordering::SeqCst is described as:

Atomic operations tagged memory_order_seq_cst not only order memory the same way as release/acquire ordering (everything that happened-before a store in one thread becomes a visible side effect in the thread that did a load), but also establish a single total modification order of all atomic operations that are so tagged.

What is single total modification order on concrete example?


Solution

  • Although Chayim answer is correct, but I think it's still very difficult to wrap one's head around the concept. A useful mind model to remember is that reordering of operations might happen if it is not forbidden, and access different parts of memory from the perspective of the current thread. In rustonomicon two types or reordering are described:

    Compiler Reordering - if performance is improved while result is same why not reordering.

    Hardware Reordering - this topic is more subtle and apparently rustonomicon reasoning about cache causing reordering is stricly speaking incorrect (thanks a lot to @peter-cordes for pointing that out). CPU caches are always coherent (MESI protocol guarantees invalidation of other copies before committing a write to a cache line). Out-of-order execution is one reason the reordering may happen in the CPU core but it is not the only reason. Here are two more examples of what may cause reordering in the CPU. And here is a detailed explanation of one of the examples (with still a lot of discussion afterwards).

    And here is an excellent documentation (Howells, McKenney, Deacon, Zijlstra) touching both hardware and compiler which claims to still be incomplete.

    We may use both hardware or compiler reordering for the first example:

    let a: &'static _ = Box::leak(Box::new(AtomicBool::new(false)));
    let b: &'static _ = Box::leak(Box::new(AtomicBool::new(false)));
    let counter: &'static _ = Box::leak(Box::new(AtomicUsize::new(0)));
    
    1) let _thread_a = spawn(move || a.store(true, Ordering::Release));
    2) let _thread_b = spawn(move || b.store(true, Ordering::Release));
    
    let thread_1 = spawn(move || {
        3) while !a.load(Ordering::Acquire) {} // prevents from reordering everything after
    
        4) if b.load(ordering::Relaxed) { // no need of Acquire due to previous restriction
            counter.fetch_add(1, Ordering::Relaxed);
        }
    });
    let thread_2 = spawn(move || {
        5) while !b.load(Ordering::Acquire) {} // prevents from reordering everything after
    
        6) if a.load(ordering::Relaxed) { // no need of Acquire due to previous restriction
            counter.fetch_add(1, Ordering::Relaxed);
        }
    });
    

    When thread2 is running on its core it has a and b and operations 5) and 6) in store. 6) is relaxed, 6) and 5) use different memory and don't affect each other, so there is nothing that prevents 6) happening before 5).

    As for the second example:

    let a = thread::spawn(|| {
        1) A.store(true, Release); // nothing can be placed before
        2) if !B.load(Acquire) { // nothing can be reordered after
            unsafe { S.push('!') };
        }
    });
        
    let b = thread::spawn(|| {
        3) B.store(true, Release); // nothing can be placed before
        4) if !A.load(Acquire) { // nothing can be reordered after
            unsafe { S.push('!') };
        }
    });
    

    there is this similar question talking about it.

    If we look at this paper (it is for c++ but you rightly noticed that Rust uses same model), it describes the memory guarantees for Release and Acquire as:

    • atomic operations on the same object may never be reordered [CSC12, 1.10.19, p. 14],

    • (non-)atomic write operations that are sequenced before a release operation A may not be reordered after A,

    • (non-)atomic load operations that are sequenced after an acquire operation A may not be reordered before A.

    So in principle it is not forbiden to reorder 1) <-> 2) or 3) <-> 4).