I wrote a simple Hello world program in NASM, to then look at using objdump -d
out of curiosity. The program is as follows:
BITS 64
SECTION .text
GLOBAL _start
_start:
mov rax, 0x01
mov rdi, 0x00
mov rsi, hello_world
mov rdx, hello_world_len
syscall
mov rax, 0x3C
syscall
SECTION .data
hello_world: db "Hello, world!", 0x0A
hello_world_len: equ $-hello_world
When I inspected this program, I found that the actual implementation of this uses movabs
with the hex value 0x402000
in place of a name, which makes sense, except for the fact that surely this would mean that it knows 'Hello, world!' is going to be stored at 0x402000
everytime the program is run, and there is no reference to 'Hello, world!' anywhere in the output of objdump -d hello_world
(the output of which I provided below).
I tried rewriting the program; This time I replaced hello_world
on line 8 with mov rsi, 0x402000
and the program still compiled and worked perfectly.
I thought maybe it was some encoding of the name, however changing the text 'hello_world' in SECTION .data
did not change the outcome either.
I'm more confused than anything - How does it know the address at compile time, and how come it never changes, even on recompilation?
(OUTPUT OF objdump -d hello_world
)
./hello_world: file format elf64-x86-64
Disassembly of section .text:
0000000000401000 <_start>:
401000: b8 01 00 00 00 mov $0x1,%eax
401005: bf 00 00 00 00 mov $0x0,%edi
40100a: 48 be 00 20 40 00 00 movabs $0x402000,%rsi
401011: 00 00 00
401014: ba 0e 00 00 00 mov $0xe,%edx
401019: 0f 05 syscall
40101b: b8 3c 00 00 00 mov $0x3c,%eax
401020: bf 00 00 00 00 syscall
(as you can see, no 'Disassembly of section .data', which further confuses me)
The string is known at compile time too. It statically exists in your executable. The compiler put it at the address in the first place, so of course it knows the address!
(And in an ASLR or dylib environment this would still apply, because all addresses relative to the module would get shifted as needed and the compiler would put a relocation entry so the loader knows there is an address reference there to fix up, but they would still stay the same relative to each other.)
And this doesn't mean that every program ever existing will have unique memory locations, nor does it mean that all contents of a program have to idly sit around and use up all of your memory even if they are rarely needed, because this is virtual memory.
The address is only meaningful within your own process, and the memory page in question doesn't have to exist in memory physically, it can be paged in and out as needed, and it's the OS' memory manager's job to decide what to keep in physical memory at what times. Attempting to access an address belonging to a page that's not physically in memory will make it transparently get paged in by the kernel at that point in time. But with such a small program, most likely the whole program will be in memory from the start.
In user-mode code, you will generally never see physical memory addresses. This is entirely abstracted away by the kernel.