consider the assembly code - mov edi, offset newarray
As per what I have read this will put the address of newarray
into the register edi
What I don't understand is this term offset
how does the English meaning of this term
fits here.
x86 memory addresses are of the form segment:offset
, where the offset part is in a normal (general purpose) register like EDI (or RDI in 64-bit code).
In modern systems where we use a flat memory model, the segment
base is always 0, and the offset
is the whole address, equal to the linear address.
x86-64 addressing for mov eax, [rdi + rax*4 + 1234]
:
RDI is the base register in the addressing mode.
[rdi+rax]
)RDI + RAX*4 + 1234
is the effective address calculated by that addressing mode. (Hence lea eax, fs:[rdi + rax*4 + 1234]
gives you that, even if the FS segment base is non-zero.) This is the offset part of the seg:off address.
RDI as the base register implies the DS segment. The seg:off
-> linear address is DS_base + offset
. (Which equals the offset because the CS, DS, ES, and SS bases are fixed to 0 in 64-bit mode, and in 32-bit mode mainstream OSes set them to 0.)
The linear address is a virtual address because 64-bit mode requires that paging is enabled.
virt->phys translation happens by looking up the page-number part (the bits above the bottom 12) in the page tables, cached by the TLB. (Why in x86-64 the virtual address are 4 bits shorter than physical (48 bits vs. 52 long)?). The resulting physical address is used to access memory (via the cache), or MMIO over PCIe if the phys address is a device address.
So yes, to index an array, you want RDI = OFFSET my_array
in MASM syntax. (Of course in 64-bit mode you'd want a RIP-relative LEA like lea rdi, [my_array]
, but in 32-bit mode yes you'd do mov edi, OFFSET my_array
.)
The OS takes care of making sure the segment base address is 0
for CS, DS, ES, and SS, so [ebx]
and [ebp]
access the same linear address when EBX = EBP. (implicit DS vs. SS segments implied by using EBX or EBP as the base register for the addressing mode.)
And that call ebx
would fetch code from the same memory address you could read or write with mov dword ptr [ebx], 0C3909090
(3x nop + ret).
Legacy 16-bit real-mode mode code often does need to make sure the DS segment base is set to match the start of the data section. Or a data section in a large program.** (In real mode, mov ds, ax
writing to a segment register sets the base = value<<4
, instead of using the value as a selector indexing the GDT or LDT.)
For example, DOS .exe
programs often started with
PROC main
mov ax, @data ; segment for the data section
mov ds, ax
mov es, ax
mov bx, OFFSET other_var
mov ax, [bx] ; relies on DS being set properly
mov [some_var], ax ; also relies on DS; uses the offset of some_var in the addressing mode.
...
You could mov bx, OFFSET some_var
before setting DS, and dereference it after. The segment and offset parts are independent.
Similarly, legacy BIOS MBR bootloaders need to set DS to match the org
value they assume, otherwise they access the wrong place in memory.
Real-mode programs with more than 64k of data would need multiple segments, and have to switch DS or ES to use them. This was generally inconvenient compared to having universal 32-bit pointers that any code could use efficiently, and x86-16 is pretty thoroughly obsolete, so it's rare for people to write new programs that actually use x86 segmentation for anything. If you want lots of memory, it's much easier to write 32-bit or 64-bit code.