Search code examples
assemblyoptimizationarm64code-size

Is there any way to shorten this machine code Hello World in AArch64 assembly?


I am working on a compiled AArch64 assembly file of a "Hello World" program for Linux.

I have already successfully shortened it from 504 bytes down to 124 bytes.The only more "optimization" I can think of would be to find some instructions that perform two or more tasks in single instruction.

Currently the machine code in the file (represented in asm) is this:

  mov x8, 64     // __NR_write
  adr x1, hello  //the string, I know the exact address
  mov x2, 10     //string length (actually only "HelloWorld")

j:
  mov x0, 0      // write to stdin happens to work
  svc 0
  mov x8, 93     // __NR_exit
  b j    //the branching saves me one instruction to exit with status = 0

Is there any instruction to shorten anything here?


Solution

  • It might work to ldp x0, x2, [sp], #16 to pop the top two words from the stack, argc and argv[0], into x0 and x2, if you don't mind writing a bunch of binary \0 bytes (or even other garbage) after your string.

    The Linux process startup environment has the stack pointer pointing at argc, and above that the argv[] array values. (Not a pointer to argv like main gets; the first double-word of it is argv[0]. Above argv[] is env[].)

    • argc will be 1, so that works for stdout fd, if run normally from a shell with no args.
    • argv is a pointer to stack memory, so is a huge integer much larger than 10, so write() will read bytes until it hits an unmapped page.
      (Linux write does actually copy preceding bytes to the fd, not returning -EFAULT if a non-zero amount of bytes can be written before hitting a fault. It seems it only checks the readability of later pages when it gets to them. This is an implementation detail that isn't documented, but is what current Linux actually does, at least on x86-64.)

    This might even still exit with 0 status, assuming it's run with no args. Post-increment addressing will make the ldp next iteration load x0 = argv[1] = NULL. (And env[0] into x2; we know we won't segfault from reading past the top of the stack region because env[] is up there.)

    But it's not necessary to exit(0) to get text printed; any exit status can work. (If you don't mind the noise from the shell, you could even arrange your program so it segfaults instead of making an exit system call, saving all the instructions after the first svc 0!)


    If you ran the program with no args via a manual execve, so argv[0] = 0, it would call write(0, hello, 0) and thus wouldn't print anything.

    But if you ran it with one arg (not counting argv[0] that shells pass implicitly), it would print to stderr. With 2 or more args, it would try to write to a not-open fd and write would return -EBADF, as you could see under strace.