Search code examples
coptimizationlanguage-lawyer

Can the C compiler optimize a conditional store into an unconditional store?


Take the following piece of code:

#include <stdbool.h>

bool global;

bool foo(void) {
    if (global) {
        global = false;
        return true;
    }
    return false;
}

Is this code theoretically equal to the following?

#include <stdbool.h>

bool global;

bool baz(void) {
    bool tmp = global;
    global = false;
    return tmp;
}

I am inclined to think that foo and baz are theoretically equivalent: global is not volatile and there is nothing else forcing the compiler to consider multithreading (something like atomic_bool, for example). So I would think that if a branch is way more expensive than a store, then the compiler would opt to go with the baz solution instead of the foo one. But I cannot get the compiler to actually do this, taking the GCC 13.2.0 RISC-V compiler with -std=gnu2x -march=rv32if -mabi=ilp32f -O3 -mbranch-cost=2000 still results in foo. On the other hand, I am also unable to make the compiler turn baz into foo.

So am I wrong and is there a theoretical difference between foo and baz? Or is this just an optimization GCC (RISC-V?) does not take?


Solution

  • The two functions differ in a multithreaded environment. Consider what happens if there is a thread that reads global from time to time and foo or baz is called while global contains false. In this situation, foo will not store to global but baz will.

    C 2018 5.1.2.4 4 defines conflict:

    Two expression evaluations conflict if one of them modifies a memory location and the other one reads or modifies the same memory location.

    C 2018 51.2.4 35 says:

    The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior.

    This means the behavior with baz is undefined, since the modification to global (Note 2 in C 2018 3.1 3 tells us “‘Modify’ includes the case where the new value being stored is the same as the previous value.”) in the thread calling baz conflicts with the read in the other thread.

    However, the behavior with foo remains defined in this situation, as it does not modify global. (Since the behavior of baz is undefined, the compiler could generate the same code for it that it does for foo.)

    Note this difference in the semantics of the routines does not preclude the compiler from generating the same code for them. In a single thread, the two routines have the same semantics as defined by the C standard:

    • There is no observable behavior inside either routine (no access to volatile objects, no data written to files, no input/output interactions).
    • After either routine is called, global contains false and the return value is true.
    • Neither routine makes any other changes to program state.

    Therefore, regardless of which routine is called, the single-thread effects on observable behavior and abstract machine state are identical.