I'm using a AMD64 computer(Intel Pentium Gold 4415U) to compare some assembly instructions converted from C language(of course, exactly, disassembly).
With Windows 10, I used Visual Studio 2017(15.2) with their C compiler. My example code is shown below:
int main() {
int i = 0;
if(++i == 4);
if(i++ == 4);
return 0;
}
The disassembly shows as below:
mov eax,dword ptr [i] // if (++i == 4);
inc eax
mov dword ptr [i],eax
mov eax,dword ptr [i] // if (i++ == 4);
mov dword ptr [rbp+0D4h],eax ; save old i to a temporary
mov eax,dword ptr [i]
inc eax
mov dword ptr [i],eax
cmp dword ptr [rbp+0D4h],4 ; compare with previous i
jne main+51h (07FF7DDBF3601h)
mov dword ptr [rbp+0D8h],1
jmp main+5Bh (07FF7DDBF360Bh)
*mov dword ptr [rbp+0D8h],0
07FF7DDBF3601 goes to the last line instruction(*).
07FF7DDBF360B goes to 'return 0;'.
In if (++i == 4)
, the program doesn't observes whether 'added' i satisfies the condition.
However in if (i++ == 4)
, the program saves the 'previous' i to the stack, and then does the increment. After, the program compare 'previous' i with the constant integer 4.
What was the cause of the difference of two C codes? Is it just a compiler's mechanism? Will it be different with more complex code?
I tried to find about this with Google, however I failed to find the origin of the difference. Have to I understand 'This is just a compiler behavior'?
Like Paul says, the program has no observable side-effects, and with optimization enabled MSVC or any of the other major compilers (gcc/clang/ICC) will compile main
to simply xor eax,eax
/ ret
.
i
's value never escapes the function (not stored to a global or returned), so it can be optimized away entirely. And even if it was, constant-propagation is trivial here.
It's just a quirk / implementation detail that MSVC's debug-mode anti-optimized code-gen decides not to emit a cmp/jcc
over an empty if
body; even in debug mode that wouldn't be helpful for debugging at all. It would be a branch instruction that jumps to the same address it falls through to.
The point of debug-mode code is that you can single-step through source lines, and modify C variables with a debugger. Not that the asm is a literal and faithful transliteration of C into asm. (And also that the compiler generates it quickly, without spending any effort on quality, to speed up edit/compile/run cycles.) Why does clang produce inefficient asm with -O0 (for this simple floating point sum)?
Exactly how braindead the compiler's code-gen is doesn't depend on any language rules; there are no actual standards that define what compilers have to do in debug-mode as far as actually using a branch instruction for an empty if
body.
Apparently with your compiler version, the i++
post-increment was enough to make the compiler forget that the loop body was empty?
I can't reproduce your result with MSVC 19.0 or 19.10 on the Godbolt compiler explorer, with 32 or 64-bit mode. (VS2015 or VS2017). Or any other MSVC version. I get no conditional branches at all from MSVC, ICC, or gcc.
MSVC does implement i++
with an actual store to memory for the old value, like you show, though. So terrible. GCC -O0
makes significantly more efficient debug-mode code. Still pretty braindead of course, but within a single statement it's sometimes a lot less bad.
I can reproduce it with clang, though! (But it branches for both if
s):
# clang8.0 -O0
main: # @main
push rbp
mov rbp, rsp
mov dword ptr [rbp - 4], 0 # default return value
mov dword ptr [rbp - 8], 0 # int i=0;
mov eax, dword ptr [rbp - 8]
add eax, 1
mov dword ptr [rbp - 8], eax
cmp eax, 4 # uses the i++ result still in a register
jne .LBB0_2 # jump over if() body
jmp .LBB0_2 # jump over else body, I think.
.LBB0_2:
mov eax, dword ptr [rbp - 8]
mov ecx, eax
add ecx, 1 # i++ uses a 2nd register
mov dword ptr [rbp - 8], ecx
cmp eax, 4
jne .LBB0_4
jmp .LBB0_4
.LBB0_4:
xor eax, eax # return 0
pop rbp # tear down stack frame.
ret