Search code examples
assemblymasmmasm32

What's the point of the pointer in MASM?


I am learning Assembly Language For Intel-Based Computers, and there is something that quite confuses me.

It is known that, we can use any General-Purpose Registers to address, like this:

.data
array1 byte 1,2,3,4
.code
mov esi,offset array1
mov al,[esi] ; al=1
mov al,[esi+1] ; al=2
mov al,[esi+2] ; al=3
mov al,[esi+3] ; al=4

The [esi+idata](idata is the immediate data) is just like the pointer, strong and enough.

But the book tells me how to type define a pointer like this:

pbyte tpyedef ptr byte
.data
array1 byte 1,2,3,4
ptr1 pbyte array
.code
mov esi,ptr1
mov al,[esi] ; al=1
mov al,[esi+1] ; al=2
mov al,[esi+2] ; al=3
mov al,[esi+3] ; al=4

So what realy confuses me is that since the array1 is a pointer to the data, what's the point of ptr1? The pointer also requires 4 bytes to store.

Isn't mov esi,ptr1 equal to mov esi,offset array1?

Also, the book gives different types of the pointer like:

pbyte tpyedef ptr byte
pbyte tpyedef ptr word
pbyte tpyedef ptr dword
pbyte tpyedef ptr qword

Doesn't the pointer just point one byte in the memory? What's the difference? I have tried like this:

pt typedef ptr qword 
.data 
array byte 1,2,3,4,5,6,7,8
arrptr pt array
.code
mov esi,arrptr
mov eax,[esi]

And I get 04030201 in eax, a dword type, not a qword.

In summary, I wonder whether it is quite necessary for the pointer existence in MASM. Or is there something that the pointer can but [esi+idata] can't?


Solution

  • Isn't mov esi,ptr1 equal to mov esi,offset array1?

    Beware that mov esi, ptr1 actually means mov esi, DWORD PTR [ptr1] and that's an important distinction.
    It's an indirect level more than mov esi, OFFSET array.

    Simply put - if array resides at, say, 0x1000 than mov esi, OFFSET array is just the readable version of mov esi, 1000h (the actual instruction executed).
    While, if ptr1 is at 0x1010, mov esi, ptr1 is assembled into mov esi, DWORD PTR [1010h] - Read the content of the DWORD at 0x1010 and store it into ESI.

    In order for the two instructions to have the same behaviour, the DWORD at 0x1010 must be 0x1000 (i.e. points to array).

    It could, however, point to any other array as well.
    In your example it is pretty much useless, not because you only have a single array but because the data is statically allocated - so you can always get its address with OFFSET.
    Imagine you want to discard the first word of a string, a procedure can take a pointer as a parameter (that can be initialized with OFFSET if the data is statically allocated) and return a pointer to the first non-discarded word

    +--- Original pointer, initialized with OFFSET
    |
    v
    Hello world from SO!
          ^
          |
          +--- Pointer returned
    

    You can save the result into a variable called, say, ptr1.
    Now if you want to repeat the process, you need to use the value of ptr1 as the input for the procedure.
    No OFFSET will do since the value of the pointer cannot be deduced at assembler time.


    Doesn't the pointer just point one byte in the memory? What's the difference?

    In general giving the assembler more information than needed comes handy to:

    1. Document the code
    2. Let it perform some initialization
    3. Let it perform some check

    Knowing that a variable is a pointer can help future readers.

    TYPEDEF PTR can be used to specify near and far pointers, but that's irrelevant for protected mode as set up by any modern OS.
    A far pointer is initialized differently from a near one.

    Maybe MASM performs some check on the type of the pointer and the pointee? I doubt it, but I don't know admittedly.

    As pointed out by Michael Petch in the comments:

    MASM does in fact (unlike other assemblers) have rudimentary type checking on pointers to data that have been explicitly defined and then directly referenced.


    And I get 04030201 in eax, a dword type, not a qword.

    The destination is 32-bit, so the source cannot be 64-bit.
    If the programmer explicitly uses a size (e.g. with a register name or with WORD PTR [...]) then the implicit size is overridden.


    Is there something that the pointer can but [esi+idata] can't?

    No, any high-level code feature is ultimately assembled into the ISA instructions, so by using them, you can emulate any high-level feature.

    Note that some directive work at a meta level, like .data.
    These cannot be emulated as they don't simply generate code.