Search code examples
cassemblyx86-64micro-optimization

Missing optimization: mov al, [mem] to bitfield-insert a new low byte into an integer


I want to replace the lowest byte in an integer. On x86 this is exactly mov al, [mem] but I can't seem to get compilers to output this. Am I missing an obvious code pattern that is recognized, am I misunderstanding something, or is this simply a missed optimization?

unsigned insert_1(const unsigned* a, const unsigned char* b)
{
    return (*a & ~255) | *b;
}
unsigned insert_2(const unsigned* a, const unsigned char* b)
{
    return *a >> 8 << 8 | *b;
}

GCC actually uses al but just for zeroing.

        mov     eax, DWORD PTR [rdi]
        movzx   edx, BYTE PTR [rsi]
        xor     al, al
        or      eax, edx
        ret

Clang compiles both practically verbatim

        mov     ecx, -256
        and     ecx, dword ptr [rdi]
        movzx   eax, byte ptr [rsi]
        or      eax, ecx
        ret

Solution

  • On x86 this is exactly mov al, [mem] but I can't seem to get compilers to output this.

    Try this one, arithmetic-free:

    unsigned insert_4(const unsigned* a, const unsigned char* b)
    {
        unsigned int t = *a;
        unsigned char *tcp = (unsigned char *) & t;
        tcp[0] = *b;
        return t;
    }
    
    

    insert_4(unsigned int const*, unsigned char const*):
            mov     eax, DWORD PTR [rdi]
            mov     al, BYTE PTR [rsi]
            ret
    

    A bit screwy, I know but the compilers are good at removing indirection and address taken for local variables (took a couple of tries though..).

    godbolt x86-64 gcc 13.1 -O3


    An alternative using union:

    unsigned insert_5(const unsigned* a, const unsigned char* b)
    {
        union {
            unsigned int ui;
            unsigned char uc;
        } u;
        u.ui = *a;
        u.uc = *b;
        return u.ui;
    }
    

    godbolt x86-64 gcc 13.1 -O3


    Note, these solutions are endian-specific, though it seems like what you're looking for, and, as needed can be adjusted for the other endian.