I have this ARM64 program which assembles but segfaults immediately when I run it:
// GNU Assembler, ARM64 Linux
.bss
.lcomm ARRAY, 16
.text
.global _start
_start:
mov x8, 93 // exit sys num
mov x0, 0 // success
svc 0
From brute-force trial & error I managed to fix it by adding this line:
// GNU Assembler, ARM64 Linux
.bss
.lcomm ARRAY, 16
.p2align 12 // why?
.text
.global _start
_start:
mov x8, 93 // exit sys num
mov x0, 0 // success
svc 0
It only works with .p2align 12
(equivalent to .balign 4096
) or higher, otherwise it still segfaults with values of .p2align 11
or lower. I understand the padding is likely fixing some misalignment issue, but I don't understand why it must be such a large value, as virtually every other ARM64 example I've seen, both hand-written and produced by compilers, usually inserts just a .p2align 2
before the .text
section so why do I need .p2align 12
for my tiny program?
Furthermore, I noticed the required size of the padding is inversely-proportional to the length of the .text
section. For tiny programs like the one above .p2align 12
is required to make them run without segfaulting, however the longer the .text
section becomes the smaller I can make the padding, and for programs which have thousands of instructions I don't need to add any padding at all!
I'm on an x86_64 macOS machine but I'm compiling and running these programs inside of a Docker container which is built from this Dockerfile:
FROM ubuntu:20.04
RUN apt-get update && apt-get -y install clang qemu gcc-aarch64-linux-gnu
I'm compiling and running the ARM64 programs with:
clang -nostdlib -fno-integrated-as -target aarch64-linux-gnu -s program.s -o program.out && ./program.out
I feel like I'm missing some crucial piece of information regarding GAS, QEMU, ARM64, or ELF executables but I have no clue what it is.
QEMU is getting confused by the program header for your data section:
LOAD off 0x00000000000000c0 vaddr 0x00000000004100c0 paddr 0x00000000004100c0 align 2**16
filesz 0x0000000000000000 memsz 0x0000000000000010 flags rw-
and is failing to actually mmap some writeable memory at that address; the segfault is coming from QEMU itself when it tries to memset() the BSS to zero.
Since the program runs fine on a real AArch64 Linux kernel, this is a QEMU bug. I've reported it to the upstream mailing list so we'll see if anybody comes up with a fix.