Is movsbl near ret good for performance?

char c;
int f()
{
    return c ^ 1;
}

gcc compiles this into something like

movzbl  c(%rip), %eax
xorl    $1, %eax
movsbl  %al, %eax
ret

Is it useful because of some out-of-order or superscalar feature?

Solution

No, that's a GCC missed optimization; that C can legally be do a sign-extending load in the first place. You should report it on the GCC bugzilla with keyword "missed-optimization".

clang, ICC, and MSVC (on Godbolt) compile it to the expected

f:
        movsbl  c(%rip), %eax           # sign extend first
        xorl    $1, %eax
        retq

Even trying to hand-hold GCC into that code-gen with this C fails to get GCC to do that:

int f() {
    int tmp = c;
    tmp ^= 1;
    return tmp;
}

I'm guessing that maybe GCC decides to just load 1 byte and sign-extend after instead of before. IDK why it thinks that would be a good idea. But anyway, some kind of extension to 32-bit is necessary to avoid a false dependency on the old value of RAX.

Writing the C that way tricks ICC into this missed optimization, but not MSVC or clang. They still optimize this to sign-extending first, because they know that XOR can't change any high bits.

int extend_after() {
    char tmp = c^1;
    return tmp;
}

now ICC is like GCC, but for some reason sign-extends all the way to 64-bit:

extend_after:
        movzbl    c(%rip), %eax                                 #10.16
        xorl      $1, %eax                                      #10.18
        movsbq    %al, %rax                                     #11.12
        ret                                                     #11.12