Consider the following x86 ASM instructions:
mov eax, [ebp+StrPointer]
add eax, 1
mov [ebp+StrPointer], eax
Here, [ebp+StrPointer]
is a pointer to some heap allocated string. Apparently, in tandem these instructions would update the pointer to point to the next character in the string.
What I don't understand, is why in the second instruction, add eax, 1
, we don't instead add 4. I can understand that what is happening is 1 byte is added to the eax register, and since memory is byte addressed, 1 byte is the next memory address. But how/why is 1 interpreted as "1 byte" and not just 1?
For example, if the [ebp+StrPointer]
had a value of 0xFFF0
, then I would think that add eax, 1
would result in a value of 0xFFF1
, but instead the aforementioned behavior suggests it would instead be 0xFFF4
. Why?
In C, I can understand the explanation that the compiler turns something like
int a = 1;
int *ptr = &a;
ptr = ptr + 1;
into
ptr = ptr + sizeof(int)*1
But at the assembly level there is no sizeof
call. So whats going on?
Strings in C are char
(not int
arrays). The compiler knows the size of the object referenced by the pointer and adjust the address in generated code accordingly.
Bear in mind that ++ptr
will move the reference held in ptr
to the next object, not next byte.
void foo(char *s)
{
while(*s)
{
*s++ = 'a';
}
}
foo: # @foo
cmp byte ptr [rdi], 0
je .LBB0_3
inc rdi
.LBB0_2: # =>This Inner Loop Header: Depth=1
mov byte ptr [rdi - 1], 97
cmp byte ptr [rdi], 0
lea rdi, [rdi + 1]
jne .LBB0_2
.LBB0_3:
ret
void bar(int *i)
{
while(*i)
{
*i++ = 9999;
}
}
bar: # @bar
cmp dword ptr [rdi], 0
je .LBB1_3
add rdi, 4
.LBB1_2: # =>This Inner Loop Header: Depth=1
mov dword ptr [rdi - 4], 9999
cmp dword ptr [rdi], 0
lea rdi, [rdi + 4]
jne .LBB1_2
.LBB1_3:
ret