I'm trying to load/store a memory from/to a char pointer array using the XMM0 128-bit register on a 32-bit operating system.
What I tried is very simple:
int main() {
char *data = new char[33];
for (int i = 0; i < 32; i++)
data[i] = 'a';
data[32] = 0;
ASM
{
movdqu xmm0,[data]
}
delete[] data;
}
The problem is that this doesn't seem to work. The first time I debugged the Win32 application I got:
xmm0 = 0024F8380000000000F818E30055F158
The second time I debugged it I got:
xmm0 = 0043FD6800000000002C18E3008CF158
So there must be something with the line:
movdqu xmm0,[data]
I tried using this instead:
movdqu xmm0,data
but I got the same result.
What I thought was the problem is that I copy the address instead of the data at the address. However the value shown at the xmm0
register is too large for a 32-bit address, so it must be copying memory from another address.
I also tried some other instructions I found at the internet, but with the same result.
Is it the way I'm passing the pointer or am I misunderstanding something about xmm basics?
A valid solution with an explanation will be appreciated.
Even though I found the solution (finally after three hours), I would still like an explanation:
ASM
{
push eax
mov eax,data
movdqu xmm0,[eax]
pop eax
}
Why should I pass the pointer to a 32-bit register?
#include <iostream>
int main()
{
char *dataptr = new char[33];
char datalocal[33];
dataptr[0] = 'a'; dataptr[1] = 0;
datalocal[0] = 'a'; datalocal[1] = 0;
printf("%p %p %c\n", dataptr, &dataptr, dataptr[0]);
printf("%p %p %c\n", datalocal, &datalocal, datalocal[0]);
delete[] dataptr;
}
Output:
0xd38050 0x7635bd709448 a
0x7635bd709450 0x7635bd709450 a
As we can see, the dynamic pointer data
is really a pointer variable (32 bits or 64 bits at 0x7635BD709448
), containing a pointer to the heap, 0xD38050
.
The local variable is directly a 33 characters long buffer, allocated at address 0x7635BD709450
.
But the datalocal
works also as a char *
value.
I'm a bit confused what the formal C++ explanation of this is. While writing C++ code, this feels quite natural and dataptr[0] is the first element in the heap memory (that is, dereferencing dataptr twice), but in assembler you see the true nature of dataptr
, which is address of the pointer variable. So you have first to load the heap pointer by mov eax,[data]
= loads eax
with 0xD38050
, and then you can load the content of 0xD38050
into XMM0 by using [eax]
.
With a local variable there is no variable with the address of it; the symbol datalocal
is already the address of the first element, so movdqu xmm0,[data]
will work then.
In the "wrong" case you can still do movdqu xmm0,[data]
; it's not a problem of the CPU to load 128 bits from a 32-bit variable. It will simply continue reading beyond the 32 bits and read another 96 bits belonging to other variables/code. In case you are around a memory boundary and this is the last memory page of the application, it will crash on an invalid access.
Alignment were mentioned a few times in comments. That's a valid point; to access the memory through movdqu
it should be aligned. Check your C++ compiler intrinsics. For Visual Studio this should work:
__declspec(align(16)) char datalocal[33];
char *dataptr = _aligned_malloc(33, 16);
_aligned_free(dataptr);
About my C++ interpretation: Maybe I got this wrong since the beginning.
The dataptr
is the value of the dataptr symbol, that is, that heap address. Then dataptr[0]
is dereferencing the heap address, accessing the first element of the allocated memory. &dataptr
is the address of the dataptr
value. This makes sense also with syntax like dataptr = nullptr;
, where you are storing the nullptr value into the dataptr variable, not overwriting the dataptr symbol address.
With datalocal[]
there's basically no sense in accessing the pure datalocal
, like in datalocal = 'a';
, as it's an array variable, so you should always provide the []
index. And &datalocal
is the address of such an array. The pure datalocal
is then an aliased shortcut for easier point math with arrays, etc., having also the char *
type, but if the pure datalocal
would throw a syntax error, it would still be possible to write C++ code (using &datalocal
for pointer, datalocal[..]
for elements), and it would fit with that dataptr
logic completely.
Conclusion: You had your example wrong since the beginning, because in assembly language [data]
is loading the value of data
, which is the pointer to the heap returned by new
.
This is my own explanation, and now some C++ expert will come and tear it to pieces from a formal point of view... :)))