MASM Offset vs. Label for addressing

I'm currently working my way through the Irvine x86 Assembly book, and I'm on chapter four.

They've introduced the OFFSET directive, but I'm confused on why I'd ever use it. Why wouldn't I just take the label (which is already the address of that data)? It seems like OFFSET just adds extra noise.

I have this small program to illustrate my point. I have a label for some data called array and I can move the elements of my array into al. But the book is talking about using the OFFSET directive to get the address of array and move it to esi. But this just seems unnecessary to me as I could just use the label.

I have two sections of code that do the same thing below. One where I'm using the label to access the elements of the array and the other where I'm using OFFSET to move the address into esi and then access the elements of the array.

.386
.model flat, stdcall
.stack 4096
ExitProcess PROTO, dwExitCode: DWORD

.data
    array   BYTE 10h, 20h, 30h, 40h, 50h

.code
main PROC
    xor eax, eax        ; set eax to 0

    ; Using Labels
    mov al, array      
    mov al, [array + 1]
    mov al, [array + 2]
    mov al, [array + 3]
    mov al, [array + 4]

    ; Using Offset
    mov esi, OFFSET array
    mov al, [esi]
    mov al, [esi + 1]
    mov al, [esi + 2]
    mov al, [esi + 3]
    mov al, [esi + 4]

    INVOKE ExitProcess, 0
main ENDP
END main

Are they really just two ways to achieve the same thing?

Later on in the book when talking about pointers, they have this example:

.data
arrayB byte 10h, 20h, 30h, 40h
ptrB dword arrayB

And this makes sense to me. ptrB holds the address of arrayB. But then they say, "Optionally, you can delcare ptrB with the OFFSET operator to make the relationship clearer:"

ptrB dword OFFSET arrayB

That doesn't make it clearer to me at all. I already know arrayB is an address. It looks like OFFSET is just thrown in there and it's not really doing anything. Removing OFFSET from that last line would literally achieve the same thing. What exactly does OFFSET do if I can just use a label to get the address anyways?

Solution

Are they really just two ways to achieve the same thing?

Yes, assembly has many ways to do things.

The C equivalent would be
char *p = array; then using p[0], p[1] etc. vs. using array[0], array[1], etc.

The advantage of putting a pointer in a register is that it saves some code size when you use it repeatedly; 2-byte mov instruction with just opcode + ModRM instead of encoding the absolute address into every instruction separately for a [disp32] addressing mode.

The other advantage is that you can increment the pointer with inc esi. In other cases where you don't fully unroll a loop, you need either a pointer or an index in a register.

A plain pointer is usually better than [array + ecx], especially better than [array + ecx*4] because indexed addressing modes have some downsides. ([array + ecx] is technically not indexed; it's [base + disp32] and doesn't need a SIB byte, and doesn't count as indexed for Micro fusion and addressing modes).

You can use byte offsets, though (e.g. add ecx, TYPE array), to allow a [base + disp32] addressing mode into a static array of int instead of [disp32 + idx*scale].

Using [disp32] every time avoids needing an extra instruction to put the address in a register. mov reg, imm32 is only a 5-byte single-uop instruction, but it still might not be worth it for performance before a couple static array accesses. It might depend on the how often your code is already hot in the uop cache vs. how often it has to fetch/decode. (Saving code size improves L1 I$ hit rate, or at least means more instructions fit in one cache line, so it can be worth it to use more instructions / more uops if it saves code size in something that's not in the hottest inner loop.)

Before a loop (not fully unrolled), you'd normally need an instruction to zero a loop counter / index anyway, like xor ecx, ecx. Using mov reg, imm32 is only 3 bytes longer, and no extra uops. If you're saving 4 or 5 bytes every time you use the pointer instead of an indexed addressing mode, you already come out ahead from just one array reference per iteration. And at a cost of no extra uops. (Ignoring any minor differences between the outside-the-loop cost of executing an xor-zeroing vs. mov-immediate instruction.)

Note that for x86-64, you'd typically put a static address in a register with a 7-byte RIP-relative LEA. And for you code to be LargeAddressAware at all, you can't use [array + rcx] because that only works with a [disp32 + reg] addressing mode, not [RIP + rel32].

And BTW, for consistency I'd recommend this over mov al, array

    mov al, [array + 0]
    mov al, [array + 1]
    ...

The first comment under your question is from someone you confused by doing mov al, array and then mov al, [array + 1] using 2 different syntaxes for similar addresses; I think Jester thought you intended something like mov al, OFFSET array. BTW, you could instead write it this way (I think)

mov al, array
mov al, array + 1

but I always recommend using square brackets around a memory operand for clarity. Especially if you ever look at NASM syntax where that's always required, but some people recommend that convention even if you only use MASM. (But beware that MASM does ignore brackets in some cases, when there's no register: Confusing brackets in MASM32 so don't think that using brackets in MASM makes it work like NASM.)

BTW, the performance-efficient way to load a single byte is to zero-extend it into the full register, instead of merging into the low byte of a full register. movzx eax, byte ptr [esi]

Also BTW, yes, mov esi, OFFSET array (5 bytes) is the most efficient way to put a static address in a register (code size and performance). lea esi, array is 6 bytes (opcode + modrm + [disp32] addressing mode) and can run on fewer execution ports; never use LEA without a register in 32-bit mode.

In 64-bit mode you want lea rsi, array because MASM automatically uses RIP-relative addressing for that, which you want. Otherwise still use mov esi, OFFSET array (yes ESI, not RSI) for code that isn't LargeAddressAware and can still take advantage of compact code using 32-bit absolute addresses.