Search code examples
cpointersassemblymemory

Why does the add instruction in assembly seemingly make an implicit sizeof() conversion?


Consider the following x86 ASM instructions:

mov     eax, [ebp+StrPointer]
add     eax, 1
mov     [ebp+StrPointer], eax 

Here, [ebp+StrPointer] is a pointer to some heap allocated string. Apparently, in tandem these instructions would update the pointer to point to the next character in the string.

What I don't understand, is why in the second instruction, add eax, 1, we don't instead add 4. I can understand that what is happening is 1 byte is added to the eax register, and since memory is byte addressed, 1 byte is the next memory address. But how/why is 1 interpreted as "1 byte" and not just 1?

For example, if the [ebp+StrPointer] had a value of 0xFFF0, then I would think that add eax, 1 would result in a value of 0xFFF1, but instead the aforementioned behavior suggests it would instead be 0xFFF4. Why?

In C, I can understand the explanation that the compiler turns something like

int a = 1;
int *ptr = &a;
ptr = ptr + 1;

into

ptr = ptr + sizeof(int)*1

But at the assembly level there is no sizeof call. So whats going on?


Solution

  • Strings in C are char (not int arrays). The compiler knows the size of the object referenced by the pointer and adjust the address in generated code accordingly.

    Bear in mind that ++ptr will move the reference held in ptr to the next object, not next byte.

    void foo(char *s)
    {
        while(*s) 
        {
            *s++ = 'a';
        }
    }
    
    foo:                                    # @foo
            cmp     byte ptr [rdi], 0
            je      .LBB0_3
            inc     rdi
    .LBB0_2:                                # =>This Inner Loop Header: Depth=1
            mov     byte ptr [rdi - 1], 97
            cmp     byte ptr [rdi], 0
            lea     rdi, [rdi + 1]
            jne     .LBB0_2
    .LBB0_3:
            ret
    
    void bar(int *i)
    {
        while(*i) 
        {
            *i++ = 9999;
        }
    }
    
    bar:                                    # @bar
            cmp     dword ptr [rdi], 0
            je      .LBB1_3
            add     rdi, 4
    .LBB1_2:                                # =>This Inner Loop Header: Depth=1
            mov     dword ptr [rdi - 4], 9999
            cmp     dword ptr [rdi], 0
            lea     rdi, [rdi + 4]
            jne     .LBB1_2
    .LBB1_3:
            ret