I know that the topic "short circuit operators" has been discussed a lot here on StackExchange. But w.r.t specifically the 'C' language, there is an aspect that bothers me for quite a while.
Consider some C code like
if (b && f(&s)) {...}
where b is of boolean type, s is some variable that has the type of some more or less elaborate struct and f is a getter to the struct. A more concrete example may look like:
if (active && isEmpty(&list)) {...}
Usually, I would expect that any decent compiler will inline the isEmpty
function and optimize the whole expression to a single AND
assembler instruction with the first operand already held in a data register and the second operand in memory, accessed via 'address register with offset' addressing mode (or something similar).
However, since the compiler is enforced (by the language rules) to generate the short circuit form, this kind of optimization must not be done and would ultimately produce drastically inefficient code. It is hard for me to believe that this is actually happening!
What version of assembler code will an optimizing C compiler usually generate?
What version of assembler code will an optimizing C compiler usually generate?
This is an evolving question, and compilers have gotten better and better over the years. Good programmers become familiar with the capabilities of their compilers and write code that takes advantage of it. Thus, if active && isEmpty(&list)
versus isEmpty(&list) && active
is an issue, a good programmer will choose the one that is better for their situation, and/or they may dive into the implementation of isEmpty
and see if it can be improved for this.
However, since the compiler is enforced (by the language rules) to generate the short circuit form,…
This is true and not true. The C standard specifies behavior on two levels. The semantics of C are specified using a model of an abstract computer that explicitly does what the standard says. For &&
, it says “… the && operator guarantees left-to-right evaluation…,” so it is true that is what the abstract computer does in the model.
But a C implementation does not have to implement that model literally. On the “what is really implemented” level, instead of on the model level, a C implementation only has to match the observable behavior of the model, which is (C 2018 5.1.2.3 6):
Now suppose the C compiler can see the implementation of isEmpty
, as it would when inlining it. In this case, it can see all the effects of active && isEmpty(&list)
. If active && isEmpty(&list)
contains no accesses to volatile objects and no writing to files or interacting with devices, then the actual implementation of the expression can be in any order at all, including any ordering of any subparts inside isEmpty
, providing the reordering computes the correct result. So, in the actual implementation, it is not true the compiler must guarantee the left-to-right ordering; it merely needs to guarantee the same result.
That correct result is an issue. Maybe active
guarantees that evaluation of isEmpty(&list)
will not use some null pointer, so active
needs to be evaluated “first.” However, if the compiler is inlining isEmpty
, it could evaluate parts of isEmpty
first, as long as it ensures none of its evaluations cause a problem if active
is false.
One thing that can block these sorts of optimizations is that isEmpty
might use other functions or objects that the compiler does not have a complete view of, so the compiler is forced to implement the abstract computer more literally in certain respects. This goes back to the good programmer issue—if this part of the program matters, a good programmer will consider their interactions here between the code for isEmpty
and what the compiler capabilities are. And that will likely change over the coming years.