Search code examples
cpucompiler-optimizationcpu-cachememory-barriers

about memory barriers (why the following example is error)


I read one article, https://www.kernel.org/doc/Documentation/memory-barriers.txt

In this doc, the following example shown So don't leave out the ACCESS_ONCE().

It is tempting to try to enforce ordering on identical stores on both branches of the "if" statement as follows:

q = ACCESS_ONCE(a);
if (q) {
    barrier();
    ACCESS_ONCE(b) = p;
    do_something();
} else {
    barrier();
    ACCESS_ONCE(b) = p;
    do_something_else();
}

Unfortunately, current compilers will transform this as follows at high optimization levels:

q = ACCESS_ONCE(a);
barrier();
ACCESS_ONCE(b) = p;  /* BUG: No ordering vs. load from a!!! */
if (q) {
    /* ACCESS_ONCE(b) = p; -- moved up, BUG!!! */
    do_something();
} else {
    /* ACCESS_ONCE(b) = p; -- moved up, BUG!!! */
    do_something_else();
}

I don't know, why "moveed up" is a bug ? If I write code, I will move "ACCESS_ONE(b) up because both if/else branch execute the same code.


Solution

  • It isn't so much that the moving up is a bug, it's that it exposes a bug in the code.

    The intention was to use the conditional on q (from a), to ensure that the write to b is done after the read from a; because both stores are "protected" by a conditional and "stores are not speculated", the CPU shouldn't be making the store until it knows the outcome of the condition, which requires the read to have been done first.

    The compiler defeats this intention by seeing that both branches of the conditional start with the same thing, so in a formal sense those statements are not conditioned. The problem with this is explained in the next paragraph:

    Now there is no conditional between the load from 'a' and the store to 'b', which means that the CPU is within its rights to reorder them: The conditional is absolutely required, and must be present in the assembly code even after all compiler optimizations have been applied.

    I'm not experienced enough to know exactly what is meant by barrier(), but apparently it is not powerful enough to enforce the ordering between the two independent memory operations.