Search code examples
coptimizationboolean-expression

Short Circuit Boolean operators in C (again): doesn't it produce inefficient code by default?


I know that the topic "short circuit operators" has been discussed a lot here on StackExchange. But w.r.t specifically the 'C' language, there is an aspect that bothers me for quite a while.

Consider some C code like

if (b && f(&s)) {...}

where b is of boolean type, s is some variable that has the type of some more or less elaborate struct and f is a getter to the struct. A more concrete example may look like:

if (active && isEmpty(&list)) {...}

Usually, I would expect that any decent compiler will inline the isEmpty function and optimize the whole expression to a single AND assembler instruction with the first operand already held in a data register and the second operand in memory, accessed via 'address register with offset' addressing mode (or something similar).

However, since the compiler is enforced (by the language rules) to generate the short circuit form, this kind of optimization must not be done and would ultimately produce drastically inefficient code. It is hard for me to believe that this is actually happening!

What version of assembler code will an optimizing C compiler usually generate?


Solution

  • What version of assembler code will an optimizing C compiler usually generate?

    This is an evolving question, and compilers have gotten better and better over the years. Good programmers become familiar with the capabilities of their compilers and write code that takes advantage of it. Thus, if active && isEmpty(&list) versus isEmpty(&list) && active is an issue, a good programmer will choose the one that is better for their situation, and/or they may dive into the implementation of isEmpty and see if it can be improved for this.

    However, since the compiler is enforced (by the language rules) to generate the short circuit form,…

    This is true and not true. The C standard specifies behavior on two levels. The semantics of C are specified using a model of an abstract computer that explicitly does what the standard says. For &&, it says “… the && operator guarantees left-to-right evaluation…,” so it is true that is what the abstract computer does in the model.

    But a C implementation does not have to implement that model literally. On the “what is really implemented” level, instead of on the model level, a C implementation only has to match the observable behavior of the model, which is (C 2018 5.1.2.3 6):

    • Accesses to volatile objects are evaluated strictly according to the rules of the abstract machine.
    • At program termination, all data written into files shall be identical to the result that execution of the program according to the abstract semantics would have produced.
    • The input and output dynamics of interactive devices shall take place as specified in 7.21.3. The intent of these requirements is that unbuffered or line-buffered output appear as soon as possible, to ensure that prompting messages actually appear prior to a program waiting for input.

    Now suppose the C compiler can see the implementation of isEmpty, as it would when inlining it. In this case, it can see all the effects of active && isEmpty(&list). If active && isEmpty(&list) contains no accesses to volatile objects and no writing to files or interacting with devices, then the actual implementation of the expression can be in any order at all, including any ordering of any subparts inside isEmpty, providing the reordering computes the correct result. So, in the actual implementation, it is not true the compiler must guarantee the left-to-right ordering; it merely needs to guarantee the same result.

    That correct result is an issue. Maybe active guarantees that evaluation of isEmpty(&list) will not use some null pointer, so active needs to be evaluated “first.” However, if the compiler is inlining isEmpty, it could evaluate parts of isEmpty first, as long as it ensures none of its evaluations cause a problem if active is false.

    One thing that can block these sorts of optimizations is that isEmpty might use other functions or objects that the compiler does not have a complete view of, so the compiler is forced to implement the abstract computer more literally in certain respects. This goes back to the good programmer issue—if this part of the program matters, a good programmer will consider their interactions here between the code for isEmpty and what the compiler capabilities are. And that will likely change over the coming years.