Consider the following code:
void MemMove8(void* dst, void* src)
{
char tmp[8];
memcpy(tmp, src, 8);
memcpy(dst, tmp, 8);
}
MSVC (16.7.1) x86 with /O2 generates the following assembly for this function:
; _dst$ = ecx
; _src$ = edx
mov eax, DWORD PTR [edx]
mov DWORD PTR [ecx], eax
mov eax, DWORD PTR [edx+4]
mov DWORD PTR [ecx+4], eax
But this doesn't work (in some cases) if the input and output buffers overlap.
The generated code seems wrong to me; or is this a valid transformation and I am missing something here?
It's a bug.
This seems to have been fixed in VS 16.8.