Do modern compilers (or perhaps these have been in place since C89) substitute in short circuit evaluated code for cases like the one below during conditional expression evaluations?
char mystring[32] = "this is a long line";
if((strnlen(mystring, 32)) > 2)
{
return 1;
}
As in the right operand is taken into account during processing strnlen(...)
, and the moment the running length of the C string within strnlen(...)
exceeds the right operand of the outer conditional expression (2 in this case), strnlen(...)
breaks out?
<
?Maybe, depending on the compiler. Let's look at some examples, compiled with gcc 13.2.0 and clang 17.0.1, both at optimization level -O3
and with extensions enabled (note that strnlen
is POSIX, not standard C).
int p() {
char mystring[32] = "this is a long line";
return strnlen(mystring, 32) > 2;
}
Both clang and gcc optimize this to mov eax, 1; ret
. This is because they know the behavior of strnlen
and can substitute in the return value of the call without needing to evaluate it at runtime. (In gcc this is implemented via __builtin_strnlen
).
If the strnlen
function isn't a known builtin, but can be inlined:
inline size_t my_strnlen(char const* s, size_t n) {
for (size_t i = 0; i != n; ++i)
if (s[i] == 0)
return i;
return n;
}
int p() {
char mystring[32] = "this is a long line";
return my_strnlen(mystring, 32) > 2;
}
Here clang optimizes to mov eax, 1
but gcc emits a loop.
Finally, for an unknown predicate that is marked pure
to tell the optimizer that it has no side effects:
__attribute__((pure)) int f(char);
inline size_t my_strnlen_f(char const* s, size_t n) {
for (size_t i = 0; i != n; ++i)
if (f(s[i]))
return i;
return n;
}
int p() {
char mystring[32] = "this is a long line";
return my_strnlen_f(mystring, 32) > 2;
}
gcc again emits a loop; clang emits some rather clumsy code (what's up with ebx
?) that nevertheless shows that it knows that f
needs to be called no more than 3 times, with the character codes of the first 3 characters - it optimizes out the full string:
p: # @p
push rbx
mov edi, 116 # 't'
call f@PLT
xor ebx, ebx
test eax, eax
je .LBB2_1
.LBB2_3:
mov eax, ebx
pop rbx
ret
.LBB2_1:
mov edi, 104 # 'h'
call f@PLT
test eax, eax
jne .LBB2_3
mov edi, 105 # 'i'
call f@PLT
xor ebx, ebx
test eax, eax
sete bl
mov eax, ebx
pop rbx
ret
- Would it have mattered if I hadn't preassigned the string length?
No, in that case the C language would just set the buffer size to the size of the string literal (string length + 1 for the terminator).
- Would it have mattered if I had removed the parentheses from the IF inner expression?
No, the optimizer runs on a program representation that does not include these details of syntax.
- Would it have mattered if I had switched the operands and the operator to a
<
?
Almost certainly not, the optimizer is capable of understanding that these are equivalent.