Aarch64 is there a Red Zone on Linux, If so 16 or 128 bytes?

There doesn't seem to be any mention of a "Red Zone" for Aarch64 in the ABI, but Microsoft makes reference to a 16-byte red zone for Aarch64, Apple claims a 128-byte red zone in Writing ARM64 code for Apple platforms and in the iOS discussion. A 128 byte red zone is similarly discussed in Google-groups: AArch64 debug build woes. However, other than the Microsoft and Apple links, the others are just "discussion" and none provide a reference to any primary documentation.

I can't find any reference in the developer.arm.com documentation. Is there a valid red zone on Linux for Aarch64 and is it similar to x86_64? What size is it? (it would make sense it would be consistent with the x86_64 red zone - if it exists) Is there a primary ABI document that addresses the red zone on Aarch64 for Linux somewhere?

It would sure be handy for scratch use without adjusting sp if exists and there is some assurance it isn't overwritten by interrupt handling, some system calls, etc.. That is basically what I'm trying to nail down.

Solution

To the best of my knowledge, the Linux AArch64 ABI does not include a red zone. For instance, you can see that gcc and clang won't use a red zone even for simple code where it would be obvious to do so, and for which the red zone is used on x86-64.

Also, here is a test program in which, at least on my device (archlinuxarm aarch64 on a Raspberry Pi 4B), shows that the region immediately below sp is overwritten when a signal is handled.

ARM's current official AAPCS ABIs are posted on Github. They don't provide for a red zone. In fact, it specifically forbids code from accessing the region below SP:

The active region of T's stack is the area of memory delimited by the half-open interval [T.SP, T.base). The active region is empty when T.SP is equal to T.base.

The inactive region of T's stack is the area of memory denoted by the half-open interval [T.limit, T.SP). The inactive region is empty when T.SP is equal to T.limit.

[...]

No thread is permitted to access (for reading or for writing) the inactive region of S.

As far as I'm aware, Linux follows the AAPCS ABI exactly. There are no differences mentioned in the Linux kernel documentation, except for a feature allowing tagged pointers to be passed to system calls, which has nothing to do with us.

Microsoft and Apple use variant AArch64 ABIs that mostly follow AAPCS but have some documented differences, and in each case they do provide for some sort of red zone.

As you saw (and also in more official docs), Microsoft's ABI provides for a 16-byte red zone, but it is reserved for debuggers and the application may not use it. So all it means is that signal handling code, etc, must not overwrite that region.
For Apple, the page you linked exists precisely to describe how their ABI differs from AAPCS. Indeed, Apple's ABI provides a 128-byte red zone which the application is allowed to use. (Though I can't seem to create code that convinces Apple's clang compiler to use it.)

In my opinion, AArch64 has less of a need for an red zone than does x86-64. First, AArch64 has roughly twice as many registers, so you're less likely to need to spill data to the stack in the first place. Second, the pre/post-indexed addressing modes available in the load/store instructions can often let you adjust the stack pointer for zero extra cost in code size or cycles. If your function needs scratch space on the stack, your first store instruction can pre-decrement the stack pointer to allocate space, and your last access can post-increment to deallocate it. (Assuming you know statically which store is the first, which load is the last, and how much space is needed - but that's likely to hold for the kinds of simple functions that would otherwise benefit from a red zone.)