char c;
int f()
{
return c ^ 1;
}
gcc compiles this into something like
movzbl c(%rip), %eax
xorl $1, %eax
movsbl %al, %eax
ret
Is it useful because of some out-of-order or superscalar feature?
No, that's a GCC missed optimization; that C can legally be do a sign-extending load in the first place. You should report it on the GCC bugzilla with keyword "missed-optimization".
clang, ICC, and MSVC (on Godbolt) compile it to the expected
f:
movsbl c(%rip), %eax # sign extend first
xorl $1, %eax
retq
Even trying to hand-hold GCC into that code-gen with this C fails to get GCC to do that:
int f() {
int tmp = c;
tmp ^= 1;
return tmp;
}
I'm guessing that maybe GCC decides to just load 1 byte and sign-extend after instead of before. IDK why it thinks that would be a good idea. But anyway, some kind of extension to 32-bit is necessary to avoid a false dependency on the old value of RAX.
Writing the C that way tricks ICC into this missed optimization, but not MSVC or clang. They still optimize this to sign-extending first, because they know that XOR can't change any high bits.
int extend_after() {
char tmp = c^1;
return tmp;
}
now ICC is like GCC, but for some reason sign-extends all the way to 64-bit:
extend_after:
movzbl c(%rip), %eax #10.16
xorl $1, %eax #10.18
movsbq %al, %rax #11.12
ret #11.12