Search code examples
macosassemblyarm64calling-conventionabi

Consequence of violating macOS's ARM64 calling convention


I'm porting some AArch64/ARM64/Apple Silicon assembly code from Linux to macOS.

This code uses all 31 available registers (stack pointer doesn't count) to avoid almost all cases of spilling; the Linux calling conventions allow me to use that many registers.

If pressed, I would admit that spilling one extra register (thus bringing it down to 30 registers used) is feasible as performance would be minimally affected, but if restricted to 29 or less available registers, performance would suffer considerably more. Thus, I'd really like to have at least 30 registers available, and ideally 31.

I just learned from this official Apple document that two extra registers are reserved, beyond what the Linux calling conventions require:

Respect the Purpose of Specific CPU Registers

The ARM standard delegates certain decisions to platform designers. Apple platforms adhere to the following choices:

  • The platforms reserve register x18. Don’t use this register.
  • The frame pointer register (x29) must always address a valid frame record. Some functions — such as leaf functions or tail calls — may opt not to create an entry in this list As a result, stack traces are always meaningful, even without debug information.

Despite these claims, my code appears to run fine without it.

Now, I fully understand that ignoring such ABI requirements is a Very Bad Thing (TM). However, I'd like to understand exactly how the code may break due to the use of each of x18 and x29.

For instance, from reading the above documentation, my understanding is that x29 is there to support debugging or crash dumps. Suppose I didn't care about debugging this function in particular (which I actually don't), or whether any generated crash dumps are meaningful. In that case, is there any harm to using x29?

As for x18, any idea what is it used for? I'd hypothesize (with zero supporting evidence) that if an interrupt or context switch executes while this code is running, x18 is not saved, and thus corrupts the results of my function once it returns. That would be a more serious condition, and I'd heed the advice to not use x18 in that case.

Also note that the code in question is a leaf function, so there is no issue with breaking any functions called from within it.


Solution

  • You can safely clobber x29 if broken backtraces are an acceptable loss to you.

    x18 is a different story.
    On macOS, Rosetta uses it, so Apple can't clobber it anymore without at least refactoring that. They also have a kernel test to make sure x18 is restored "on hardware that supports it". And so far, that is all hardware that support arm64 macOS, and all macOS versions that support Apple Silicon have this behaviour enabled.
    On iOS though, there is hardware that does not support it, specifically the A11 chip series and older. On those chips, the kernel is configured with __ARM_KERNEL_PROTECT__, which enables a Spectre mitigation that uses x18 on all exception handlers, even async ones, before the kernel gets a chance to spill any registers. So unless you're running with interrupts off, your x18 can be zeroed at any point in time. In addition, even on A12 and later, iOS versions before iOS 14.0 did clobber x18 intentionally.

    Now if you checked out the linked test, you might be tempted to check the sysctl hw.optional.arm_kernel_protect at runtime, but unfortunately that is only exported on DEVELOPMENT and DEBUG configurations of XNU.

    So if you're targeting iOS, you cannot use x18. If you're targeting macOS, then you can use it for the time being, but that may change in the future. You could try and detect such change by doing the same thing the test does: set x18 to a certain value, call sched_yield(), then check the value. But again, that relies on all exceptions treating x18 the same, and while they currently do, that too may change in the future.

    Update
    macOS 13 did indeed change this! While macOS 11 and 12 would unconditionally preserve x18, macOS 13 now has more complicated rules. x18 is now only preserved for processes running under Rosetta as well as processes that were either built against a macOS 12 SDK or older, or hold the com.apple.private.uexc or com.apple.private.custom-x18-abi entitlements.

    The same rules apply to iOS 16, with the caveats that there is no Rosetta and you cannot run binaries built against the macOS SDK. So as of iOS 16, x18 has to be considered off-limits again for all devices.