I have an assembly hello world program for Mac OS X that looks like this:
global _main
section .text
_main:
mov rax, 0x2000004
mov rdi, 1
lea rsi, [rel msg]
mov rdx, msg.len
syscall
mov rax, 0x2000001
mov rdi, 0
syscall
section .data
msg: db "Hello, World!", 10
.len: equ $ - msg
I was wondering about the line lea rsi, [rel msg]
. Why does NASM force me to do that? As I understand it, msg
is just a pointer to some data in the executable and doing mov rsi, msg
would put that address into rsi
. But if I replace the line lea rsi, [rel msg]
with , NASM throws this error (note: I am using the command nasm -f macho64 hello.asm
):
hello.asm:9: fatal: No section for index 2 offset 0 found
Why does this happen? What is so special about lea
that mov
can't do? How would I know when to use each one?
What is so special about
lea
thatmov
can't do?
mov reg,imm
loads an immediate constant into its destination operand. Immediate constant is encoded directly in the opcode, e.g. mov eax,someVar
would be encoded as B8 EF CD AB 00
if address of someVar
is 0x00ABCDEF
. I.e. to encode such an instruction with imm
being address of msg
you need to know exact address of msg
. In position-independent code you don't know it a priori.
mov reg,[expression]
loads the value located at address described by expression
. The complex encoding scheme of x86 instructions allows to have quite complex expression
: in general it's reg1+reg2*s+displ
, where s
can be 0,1,2,4, reg1
and reg2
can be general-purpose registers or zero, and displ
is immediate displacement. In 64-bit mode expression
can have one more form: RIP+displ
, i.e. the address is calculated relative to the next instruction.
lea reg,[expression]
uses all this complex way of calculating addresses to load the address itself into reg
(unlike mov
, which dereferences the address calculated). Thus the information, unavailable at compilation time, namely absolute address which would be in RIP
, can be encoded in the instruction without knowing its value. The nasm expression lea rsi,[rel msg]
gets translated into something like
lea rsi,[rip+(msg-nextInsn)]
nextInsn:
which uses the relative address msg-nextInsn
instead of absolute address of msg
, thus allowing the assembler to not know the actual address but still encode the instruction.