I am experimenting with the way parameters are passed to a function when compiling C++ code. I tried to compile the following C++ code using the x64 msvc 19.35/latest
compiler to see the resulting assembly:
#include <cstdint>
void f(std::uint32_t, std::uint32_t, std::uint32_t, std::uint32_t);
void test()
{
f(1, 2, 3, 4);
}
and got this result:
void test(void) PROC
mov edx, 2
lea r9d, QWORD PTR [rdx+2]
lea r8d, QWORD PTR [rdx+1]
lea ecx, QWORD PTR [rdx-1]
jmp void f(unsigned int,unsigned int,unsigned int,unsigned int)
void test(void) ENDP
What I do not understand is why did the compiler chose to use lea
instead of a simple mov
for this example. I understand the mechanics of lea
and how it results in the correct values in each register, but I would have expected something more straightforward like:
void test(void) PROC
mov ecx, 1
mov edx, 2
mov r8d, 3
mov r9d, 4
jmp void f(unsigned int,unsigned int,unsigned int,unsigned int)
void test(void) ENDP
Moreover, from my little understanding of how modern CPUs work, I have the feeling that the version using lea
would be slower since it adds a dependency between the lea
instructions and the mov
instruction.
clang
and gcc
both gives the result I expect, i.e., 4x mov
.
MSVC's code is smaller than the naive mov
approach. (But as you point out, because of the dependency, it may potentially be slower; you would have to test that.)
1 bits 64
2 00000000 BA02000000 mov edx, 2
3 00000005 448D4A02 lea r9d, QWORD [rdx+2]
4 00000009 448D4201 lea r8d, QWORD [rdx+1]
5 0000000D 8D4AFF lea ecx, QWORD [rdx-1]
6
7 00000010 B901000000 mov ecx, 1
8 00000015 BA02000000 mov edx, 2
9 0000001A 41B803000000 mov r8d, 3
10 00000020 41B904000000 mov r9d, 4
mov ecx, 1
is 5 bytes: one byte for the opcode B8-BF which also encodes the register, and 4 bytes for the 32-bit immediate. In particular, unlike for some arithmetic instructions, there is no option for mov
to encode a smaller immediate with fewer bytes using zero- or sign-extension.
lea ecx, [rdx-1]
is 3 bytes. One byte for the opcode; one MOD R/M byte which encodes the destination register ecx
and the base register rdx
for the effective address of the memory operand; and (here is the key) one byte for an 8-bit sign-extended displacement.
The instructions using r8,r9
need one extra byte for a REX prefix; but that's true for both mov
and lea
so it's a wash.