For example, can we do:
movl %eax,%rdx // l means 32 bits while rdx is 64 biregister
so if we move 32 bits content of %eax
to %rdx
, then only the low order 32 bits of %rdx
get updated?
There is no direct way to write to only the low 32 bits of a 64 bit GPR, but it could be emulated by clearing the low 32 bits and bitwise OR-ing the value into it:
shr rdx, 32
shl rdx, 32
or rdx, rax ; assumes top 32 bits of rax are zero
Or using a double precision shift and rotate:
shrd rdx, rax, 32
ror rdx, 32
Unfortunately this has a higher latency than the first version on most processors (except maybe Core2 and its direct ancestors), because
(this form of) the double precision shift commonly takes 3 cycles. For typical Intel processors it has a potential benefit in some situations because it takes fewer µops overall, but on AMD Zen and Intel Atom the shrd
takes a handful of µops. So in general the first version should be preferred, but in specific cases there may be a reason to use the second version.
It's also possible with two overlapping stores and a load (this is slow, for example 16 cycles on Haswell due to store-forwarding fail, and even without hitting such a bad microarchitectural edge case it would not have beaten the register-only solutions):
mov [rsp], rdx
mov [rsp], eax
mov rdx, [rsp]