Search code examples
linuxubuntuassemblyarm64

How to use stack in arm64 assembly


I was trying to write a nested loop and print a 10x10 grid of dots. And the assembler throws error when I am trying to use PUSH and POP commands in _print subroutine. Is this the right way to do it or am I doing something wrong. Please Help

.equ WIDTH, 10
.equ HEIGHT, 10

.data
DOT: .ascii "."
BLOCK: .ascii "$"
NEW_LINE: .ascii "\n"

.text
.global _start
_start:
_renderFrame:
    mov x0, HEIGHT
    mov x1, WIDTH
    bl _height

_height:
    cmp x0, 0
    beq _exit
    sub x0, x0, 1
    bl _width
    ldr x3, =NEW_LINE
    bl _print
    bl _height

_width:
    cmp x1, 0
    beq _height
    sub x1, x1, 1
    ldr x3, =DOT
    bl _print
    bl _width

_print:
    push {x0, x1, x2}
    mov x8, 0x40
    mov x0, 1
    mov x1, x3
    mov x2, 1
    svc 0
    pop {x0, x1, x2}
    ret

_exit:
    mov x8, 0x5d
    mov x0, 0
    svc 0

The error is as below

main.asm: Assembler messages:
main.asm:35: Error: unknown mnemonic `push' -- `push {x0,x1,x2}'
main.asm:41: Error: unknown mnemonic `pop' -- `pop {x0,x1,x2}'

Note: Before I tried to run assembly in macOS but it seems there is not much support articles online regarding macOS. So I am running this code in a Ubuntu Docker container with inbuilt assembler and linker. And the print and exit system calls are working fine.


Solution

  • There are no push or pop instructions in ARM64 assembly. Perhaps you have mixed it up with ARM32, which does have them.

    To push and pop to the stack on ARM64, you can use the normal load and store instructions with a pre-decrement / post-increment addressing mode on the sp register. Note that you must always adjust sp by multiples of 16 bytes, so it is convenient to push and pop two 64-bit registers at a time, using the ldp load-pair and stp store-pair instructions.

    Example:

        stp x0, x1, [sp, #-16]!  // push x0 and x1
        ldp x0, x1, [sp], #16    // pop them again
    

    If you want to push/pop more than two registers, you can either repeat this, or use the first/last store/load to adjust the stack pointer by the full amount needed for all the registers and then use non-pre/post-incrementing stores and loads for the others.

    Example:

        stp x0, x1, [sp, #-32]!
        str x2, [sp, 16] // store x2 above x0 and x1
        ldr x2, [sp, 16]
        ldp x0, x1, [sp], #32
    

    Unrelated, but the use of bl in your code seems confused. bl is meant for function calls; it saves a return address into the link register x30 and then branches to the target address, which should be the first instruction of a function or subroutine. The called function should then finish with ret to branch back to the address stored in x30, at which point execution continues with the instruction following the bl. So it's unclear why you're using it for jumps to _width and _height which are not subroutines and don't return. If you want an unconditional branch, use b instead of bl.

    But in your code as follows:

    _height:
        cmp x0, 0
        beq _exit
        sub x0, x0, 1
        bl _width
        ldr x3, =NEW_LINE // this and following lines are unreachable
        bl _print
        bl _height
    

    Since _width has no ret instruction, it will not return (indeed it overwrites x30 by executing a bl of its own) and thus the code following bl _width cannot ever be executed. So I think you need to put some more thought into your program's control flow.