x86_64: When is `movzbq` preferable over `movzbl`

On my x86_64 machine, I used objdump -d to check the encoding of the following two instructions:

movzbl (%rdi),%eax: encoded in 3 bytes (0f b6 07)
movzbq (%rdi),%rax: encoded in 4 bytes (48 0f b6 07)

Because of implicit zero extension of upper 32 bits for 32-bit operands, movzbl would achieve the same data movement task as movzbq but with 1 less byte of encoding.

When would the compiler prefer to use movzbq over movzbl despite that movzbq takes up an extra byte ?

Solution

When would the compiler prefer to use movzbq over movzbl despite that movzbq takes up an extra byte ?

Whether movbq takes up an extra byte depends on the registers used. For example, movzbl (%rdi),%r8d is encoded as 44 0f b6 07 (because the "REX prefix" is needed to select r8) and movzbq (%rdi),%r8 is encoded as 4C 0f b6 07.

This gives 2 slightly different cases:

a) It can be 1 byte shorter. In this case there's no valid reason to choose the longer movzbq and compilers that do this (when optimization is enabled) are simply bad at instruction selection.

b) It can't be 1 byte shorter. In this case there's no reason to choose one or the other - it makes no difference at all.

For both of these cases; for "compiler developer's convenience" a compiler's decisions are likely to lean towards symmetry with movsbl and movsbq (where there is an actual difference).