Search code examples
gcccpux86-64micro-optimization

Is movsbl near ret good for performance?


char c;
int f()
{
    return c ^ 1;
}

gcc compiles this into something like

movzbl  c(%rip), %eax
xorl    $1, %eax
movsbl  %al, %eax
ret

Is it useful because of some out-of-order or superscalar feature?


Solution

  • No, that's a GCC missed optimization; that C can legally be do a sign-extending load in the first place. You should report it on the GCC bugzilla with keyword "missed-optimization".

    clang, ICC, and MSVC (on Godbolt) compile it to the expected

    f:
            movsbl  c(%rip), %eax           # sign extend first
            xorl    $1, %eax
            retq
    

    Even trying to hand-hold GCC into that code-gen with this C fails to get GCC to do that:

    int f() {
        int tmp = c;
        tmp ^= 1;
        return tmp;
    }
    

    I'm guessing that maybe GCC decides to just load 1 byte and sign-extend after instead of before. IDK why it thinks that would be a good idea. But anyway, some kind of extension to 32-bit is necessary to avoid a false dependency on the old value of RAX.

    Writing the C that way tricks ICC into this missed optimization, but not MSVC or clang. They still optimize this to sign-extending first, because they know that XOR can't change any high bits.

    int extend_after() {
        char tmp = c^1;
        return tmp;
    }
    

    now ICC is like GCC, but for some reason sign-extends all the way to 64-bit:

    extend_after:
            movzbl    c(%rip), %eax                                 #10.16
            xorl      $1, %eax                                      #10.18
            movsbq    %al, %rax                                     #11.12
            ret                                                     #11.12