This is a follow up to this question.
I understand that when programming arm64 on MacOs, I can't write:
.text
ldr X0, =my_string
.data
my_string:
.asciz "Hello There"
but instead need to change the ldr
to be:
adrp, X0, my_string@PAGE
add X0, X0, my_string@PAGEOFF
For a normal non-MacOs loader, the first would be turned into:
ldr X0, fakeLabel
...
fakeLabel: .quad my_string
and then the value at fakeLabel
would be fixed at link time.
Meanwhile for both MacOs and non-MacOs, the linker would need the difference between the page that the ldrp
is on and that my_string
is one and put that delta into the ldrp
instruction.
According to the answer I was given on the other page, the former is "dynamic" loading and thus illegal, while the later is "static" loading and legal. I'm not sure I understand the difference, since both involve changing instructions. Is the delta between the ldrp
and the my_string
somehow fixed? Are text and data always loaded a certain distance apart, even though we don't know where they'll be loaded?
Am I missing something? Is this documented?
The problem isn't with instructions - all of the instruction modifications happen at link-time (=static). None of that is a problem, at link-time you can do whatever you want.
The problem lies with fakeLabel
. There's a pointer stored there. Pointers need to be rebased at runtime (=dynamic) due to ASLR. If you are currently in the .text
segment and you write ldr xN, =...
, then the fakeLabel
inserted by the compiler will also be in the .text
segment. When the process is launched, the dynamic linker will dutifully try and rebase the pointer there, but since the segment is mapped as readonly, doing so will crash the process.
This would not be an issue if fakeLabel
was emitted in the data segment, but since PC-relative ldr
has a maximum offset of ±1MiB, and since the linker is free to rearrange segments as needed, this is not possible in the general case. It would have to emit adrp+ldr
and have the linker fix it back up to PC-relative ldr
if possible, but the compiler is not allowed to turn one instruction into two, and if you're using two instructions manually... well, then you might as well go for adrp+add
right away and avoid the implicit pointer.