Search code examples
linuxmemorylinux-kernelmemory-barrierssmp

how is a memory barrier in linux kernel is used


There is an illustration in kernel source Documentation/memory-barriers.txt, like this:

    CPU 1                   CPU 2
    ======================= =======================
            { B = 7; X = 9; Y = 8; C = &Y }
    STORE A = 1
    STORE B = 2
    <write barrier>
    STORE C = &B            LOAD X
    STORE D = 4             LOAD C (gets &B)
                            LOAD *C (reads B)

Without intervention, CPU 2 may perceive the events on CPU 1 in some effectively random order, despite the write barrier issued by CPU 1:

    +-------+       :      :                :       :
    |       |       +------+                +-------+  | Sequence of update
    |       |------>| B=2  |-----       --->| Y->8  |  | of perception on
    |       |  :    +------+     \          +-------+  | CPU 2
    | CPU 1 |  :    | A=1  |      \     --->| C->&Y |  V
    |       |       +------+       |        +-------+
    |       |   wwwwwwwwwwwwwwww   |        :       :
    |       |       +------+       |        :       :
    |       |  :    | C=&B |---    |        :       :       +-------+
    |       |  :    +------+   \   |        +-------+       |       |
    |       |------>| D=4  |    ----------->| C->&B |------>|       |
    |       |       +------+       |        +-------+       |       |
    +-------+       :      :       |        :       :       |       |
                                   |        :       :       |       |
                                   |        :       :       | CPU 2 |
                                   |        +-------+       |       |
        Apparently incorrect --->  |        | B->7  |------>|       |
        perception of B (!)        |        +-------+       |       |
                                   |        :       :       |       |
                                   |        +-------+       |       |
        The load of X holds --->    \       | X->9  |------>|       |
        up the maintenance           \      +-------+       |       |
        of coherence of B             ----->| B->2  |       +-------+
                                            +-------+
                                            :       :

I don't understand, since we have a write barrier, so, any store must take effect when C = &B is executed, which means whence B would equals 2. For CPU 2, B should have been 2 when it gets the value of C, which is &B, why would it perceive B as 7. I am really confused.


Solution

  • The key missing point is the mistaken assumption that for the sequence:

    LOAD C (gets &B)
    LOAD *C (reads B)
    

    the first load has to precede the second load. A weakly ordered architectures can act "as if" the following happened:

    LOAD B (reads B)  
    LOAD C (reads &B)
    if( C!=&B ) 
        LOAD *C
    else
        Congratulate self on having already loaded *C
    

    The speculative "LOAD B" can happen, for example, because B was on the same cache line as some other variable of earlier interest or hardware prefetching grabbed it.