Search code examples
cmacosapple-m1arm64

How can jmp_buf on M1 Macs be decoded?


I am playing around with longjmp and setjmp on my M1 MacBook Air. On an x86_64 Linux machine, setjmp populates a jmp_buf struct which has a long[] containing 'mangled' register values. Going through the glibc code, I was able to decode those values to get the stack pointer and frame pointer for instance.

On my M1 MackBook Air, this jmp_buf type seems to be an int[37] according to lldb. I can see the values, and print them but none of them match up to the stack pointer or frame pointer though some are close.

I am looking for how to decode macOS M1 jmp_buf array and get the stack pointer and frame pointer. Any source code would be welcome as well. So far, I've looked through glibc, specifically the sysdeps/aarch64 directory (x86_64 directory was what allowed me to decode on my linux machine), and this mirror of Apple's open source code. None of the jmp_buf structs match and I have been unable to determine whether mangling/munging is occurring.

I have:

#include <csetjmp>
#include <iostream>

int main() {

    jmp_buf reg;
    setjmp(reg);

    int foo = 5;
    std::cout << &foo << std::endl; // <--- Location on the stack, looking for something close to this

    for (auto int offset = 0; offset < 37; offset++) {
        std::cout << offset << ": "
                  << (void*)reg[offset]    // <--- Assumes registers are stored directly
                  << ", " 
                  << (void*)reinterpret_cast<long*>(reg)[offset]  // <--- Int array for some reason but registers are 64 bits, so maybe they're just next to each other?
                  << std::endl;
    }

    return 0;
}

Which prints something like:

% ./a.out                          
0x30c890358
0: 0xc6ac510, 0x10c6ac510
1: 0x1, 0x2918cc39c9b56814
2: 0xffffffffc9b56814, 0x2918cc39c9b56f44
3: 0x2918cc39, 0x10c6adc80
4: 0xffffffffc9b56f44, 0x30c890610
5: 0x2918cc39, 0x10c6adc60
6: 0xc6adc80, 0x1042221a0
7: 0x1, 0x2918cc3bc11e4ddb
8: 0xc890610, 0x1
9: 0x3, 0x37f00001f80
10: 0xc6adc60, 0x300000000
11: 0x1, 0x200000004
12: 0x42221a0, 0x10c6ac100
13: 0x1, 0x10c6ac100
14: 0xffffffffc11e4ddb, 0x100
15: 0x2918cc3b, 0x0
16: 0x1, 0x733d5f6c888800ad
17: 0x0, 0x10c6ac510
18: 0x1f80, 0x10c6adc60
19: 0x37f, 0x733d5f6c888800ad
20: 0x0, 0x30c8906a0
21: 0x3, 0x204580310
22: 0x4, 0x0
23: 0x2, 0x0
24: 0xc6ac100, 0x0
25: 0x1, 0x0
26: 0xc6ac100, 0x20461bde0
27: 0x1, 0x42000000
28: 0x100, 0x204580443
29: 0x0, 0x204612010
30: 0x0, 0x30c890490
31: 0x0, 0x20457a000
32: 0xffffffff888800ad, 0x20457a000
33: 0x733d5f6c, 0x20461bde0
34: 0xc6ac510, 0x40000000
35: 0x1, 0x20458049d
36: 0xc6adc60, 0x204612040

I am expecting there to be a value in the jmp_buf array that is only a few bytes away from the address of foo. Offset #4 when interpreted as an array of long gets close but is farther than I would expect.

I am looking for the offset definitions and any demangling of the values that needs to happen.


Solution

  • Oh this is devious on so many levels.

    For a start: your code is running in Rosetta.

    My educated guess is that you're running either an IDE (VS Code?) or a terminal emulator that's x86_64, from which you're invoking the compiler, which will then also run as x86_64, with no explicit target arch flag, which will make it default to x86_64. Use -arch arm64 to cc/c++/gcc/g++/clang/clang++ to target arm64 explicitly, or prefix the compiler invocation with arch -arm64 [...] to run the entire process hierarchy natively.

    Now, how did I determine that your code was running under Rosetta? It's what Apple calls "pointer munging". So the official Apple source dumps happen on github.com/apple-oss-distributions, and setjmp and longjmp are implemented in src/setjmp in libplatform, with hand-rolled assembly implementations for each architecture. The arm64 implementation is this:

    ENTRY_POINT(__longjmp)
        ldp     x19, x20,   [x0, JMP_r19_20]
        ldp     x21, x22,   [x0, JMP_r21_22]
        ldp     x23, x24,   [x0, JMP_r23_24]
        ldp     x25, x26,   [x0, JMP_r25_26]
        ldp     x27, x28,   [x0, JMP_r27_28]
        ldp     x10, x11,   [x0, JMP_fp_lr]
        ldr     x12,        [x0, JMP_sp_rsvd]
        ldp     d8, d9,     [x0, JMP_d8_d9]
        ldp     d10, d11,   [x0, JMP_d10_d11]
        ldp     d12, d13,   [x0, JMP_d12_d13]
        ldp     d14, d15,   [x0, JMP_d14_d15]
        _OS_PTR_MUNGE_TOKEN(x16, x16)
        _OS_PTR_UNMUNGE(fp, x10, x16)
        _OS_PTR_UNMUNGE(lr, x11, x16)
        _OS_PTR_UNMUNGE(x12, x12, x16)
        ldrb        w16, [sp]   /* probe to detect absolutely corrupt stack pointers */
        mov     sp, x12
        cmp     w1, #0
        csinc   w0, w1, wzr, ne
        ret
    

    It makes quite heavy use of macros, so here's the raw disassembly of _longjmp from /usr/lib/system/libsystem_platform.dylib:

    ;-- __longjmp:
    0x00001d68      135040a9       ldp x19, x20, [x0]
    0x00001d6c      155841a9       ldp x21, x22, [x0, 0x10]
    0x00001d70      176042a9       ldp x23, x24, [x0, 0x20]
    0x00001d74      196843a9       ldp x25, x26, [x0, 0x30]
    0x00001d78      1b7044a9       ldp x27, x28, [x0, 0x40]
    0x00001d7c      0a2c45a9       ldp x10, x11, [x0, 0x50]
    0x00001d80      0c3040f9       ldr x12, [x0, 0x60]
    0x00001d84      0824476d       ldp d8, d9, [x0, 0x70]
    0x00001d88      0a2c486d       ldp d10, d11, [x0, 0x80]
    0x00001d8c      0c34496d       ldp d12, d13, [x0, 0x90]
    0x00001d90      0e3c4a6d       ldp d14, d15, [x0, 0xa0]
    0x00001d94      70d03bd5       mrs x16, tpidrro_el0
    0x00001d98      101e40f9       ldr x16, [x16, 0x38]
    0x00001d9c      5d0110ca       eor x29, x10, x16
    0x00001da0      7e0110ca       eor x30, x11, x16
    0x00001da4      8c0110ca       eor x12, x12, x16
    0x00001da8      f0034039       ldrb w16, [sp]
    0x00001dac      9f010091       mov sp, x12
    0x00001db0      3f000071       cmp w1, 0
    0x00001db4      20149f1a       csinc w0, w1, wzr, ne
    0x00001db8      c0035fd6       ret
    

    So the registers fp, lr and sp are stored at offsets 0x50, 0x58 and 0x60, but they're also XORed with a value loaded from [tpidrro_el0, 0x38]. The definitions of those MUNGE macros can be found in xnu/libsyscall/os/tsd.h, but they really don't tell you more than [tpidrro_el0, 0x38] either. It's just a per-process cookie that's XORed into those values. Which looks like this, if your code runs on arm64:

    0x16b9bb0f0
    0: 0x4446fbc, 0x104446fbc
    1: 0x1, 0x104450000
    2: 0x4450000, 0x104451910
    3: 0x1, 0x16b9bb2e0
    4: 0x4451910, 0x1a91ea396
    5: 0x1, 0x16b9bb260
    6: 0x6b9bb2e0, 0x1
    7: 0x1, 0x0
    8: 0xffffffffa91ea396, 0x0
    9: 0x1, 0x0
    10: 0x6b9bb260, 0x6f5a9a069b197d28
    11: 0x1, 0x2a649a06f4c6a30c
    12: 0x1, 0x6f5a9a069b197c08
    13: 0x0, 0x0
    14: 0x0, 0x0
    15: 0x0, 0x0
    16: 0x0, 0x0
    17: 0x0, 0x0
    18: 0x0, 0x0
    19: 0x0, 0x0
    20: 0xffffffff9b197d28, 0x0
    21: 0x6f5a9a06, 0x0
    22: 0xfffffffff4c6a30c, 0x100000000
    23: 0x2a649a06, 0x104450000
    24: 0xffffffff9b197c08, 0x31232f62314200ab
    25: 0x6f5a9a06, 0x16b9bb430
    26: 0x0, 0x1a916ff28
    27: 0x0, 0x0
    28: 0x0, 0x0
    29: 0x0, 0x0
    30: 0x0, 0x1045dddd8
    31: 0x0, 0x40000000
    32: 0x0, 0x10454a0c0
    33: 0x0, 0x1045d40b0
    34: 0x0, 0x104544000
    35: 0x0, 0x1045dddd8
    36: 0x0, 0x42000000
    

    Notice how the high bits at offsets 10 and 12 are identical? That's because the high bits in these registers are normally zero in userland, so if you XOR a 64-bit constant into them, the high bits will be the same. That is not at all what I see in your jmp_buf dump. Your values at these indices look more like bitmasks. Where I do see this, however, are indices 1 and 2. Which is precisely how the x86_64 implementation works:

    ;-- __longjmp:
    0x00003d2c      dbe3           fninit
    0x00003d2e      85f6           test esi, esi
    0x00003d30      b801000000     mov eax, 1
    0x00003d35      0f45c6         cmovne eax, esi
    0x00003d38      488b1f         mov rbx, qword [rdi]
    0x00003d3b      488b7708       mov rsi, qword [rdi + 8]
    0x00003d3f      654833342538.  xor rsi, qword gs:[0x38]
    0x00003d48      4889f5         mov rbp, rsi
    0x00003d4b      488b7710       mov rsi, qword [rdi + 0x10]
    0x00003d4f      654833342538.  xor rsi, qword gs:[0x38]
    0x00003d58      4c0fbe26       movsx r12, byte [rsi]
    0x00003d5c      4889f4         mov rsp, rsi
    0x00003d5f      4c8b6718       mov r12, qword [rdi + 0x18]
    0x00003d63      4c8b6f20       mov r13, qword [rdi + 0x20]
    0x00003d67      4c8b7728       mov r14, qword [rdi + 0x28]
    0x00003d6b      4c8b7f30       mov r15, qword [rdi + 0x30]
    0x00003d6f      488b7738       mov rsi, qword [rdi + 0x38]
    0x00003d73      654833342538.  xor rsi, qword gs:[0x38]
    0x00003d7c      d96f4c         fldcw word [rdi + 0x4c]
    0x00003d7f      0fae5748       ldmxcsr dword [rdi + 0x48]
    0x00003d83      fc             cld
    0x00003d84      ffe6           jmp rsi
    

    So yeah, that's how I knew. But back to my arm64 dump above, if we look at the assembly, then we would expect not just indices 10 and 12 to have the same high bits, but also index 11 (lr), but that's not the case. So what's going on there?

    Well, it turns out we're not really running the arm64 version either. We're running arm64e! In case that doesn't mean anything to you, it's a separate Apple ABI with support for ARMv8.3 Pointer Authentication. Mach-O loaders will always prefer arm64e slices over arm64 if the hardware supports it, and since all Apple Silicon Macs do so, and since all stock binaries ship with an arm64e slice, libsystem_platform.dylib will always have its arm64e slice loaded (unless you manage to manually mess with it enough, maybe?). Either way, here's the real implementations of _setjmp and _longjmp that are actually running:

    ;-- __setjmp:
    0x00001a54      7f2303d5       pacibsp
    0x00001a58      ea031daa       mov x10, x29
    0x00001a5c      ea0fc1da       pacdb x10, sp
    0x00001a60      ec030091       mov x12, sp
    0x00001a64      a97d9952       mov w9, 0xcbed
    0x00001a68      2c0dc1da       pacdb x12, x9
    0x00001a6c      70d03bd5       mrs x16, tpidrro_el0
    0x00001a70      101e40f9       ldr x16, [x16, 0x38]
    0x00001a74      4a0110ca       eor x10, x10, x16
    0x00001a78      cb0310ca       eor x11, x30, x16
    0x00001a7c      8c0110ca       eor x12, x12, x16
    0x00001a80      135000a9       stp x19, x20, [x0]
    0x00001a84      155801a9       stp x21, x22, [x0, 0x10]
    0x00001a88      176002a9       stp x23, x24, [x0, 0x20]
    0x00001a8c      196803a9       stp x25, x26, [x0, 0x30]
    0x00001a90      1b7004a9       stp x27, x28, [x0, 0x40]
    0x00001a94      0a2c05a9       stp x10, x11, [x0, 0x50]
    0x00001a98      0c3000f9       str x12, [x0, 0x60]
    0x00001a9c      0824076d       stp d8, d9, [x0, 0x70]
    0x00001aa0      0a2c086d       stp d10, d11, [x0, 0x80]
    0x00001aa4      0c34096d       stp d12, d13, [x0, 0x90]
    0x00001aa8      0e3c0a6d       stp d14, d15, [x0, 0xa0]
    0x00001aac      00008052       mov w0, 0
    0x00001ab0      ff0f5fd6       retab
    
    ;-- __longjmp:
    0x00001ab4      135040a9       ldp x19, x20, [x0]
    0x00001ab8      155841a9       ldp x21, x22, [x0, 0x10]
    0x00001abc      176042a9       ldp x23, x24, [x0, 0x20]
    0x00001ac0      196843a9       ldp x25, x26, [x0, 0x30]
    0x00001ac4      1b7044a9       ldp x27, x28, [x0, 0x40]
    0x00001ac8      0a2c45a9       ldp x10, x11, [x0, 0x50]
    0x00001acc      0c3040f9       ldr x12, [x0, 0x60]
    0x00001ad0      0824476d       ldp d8, d9, [x0, 0x70]
    0x00001ad4      0a2c486d       ldp d10, d11, [x0, 0x80]
    0x00001ad8      0c34496d       ldp d12, d13, [x0, 0x90]
    0x00001adc      0e3c4a6d       ldp d14, d15, [x0, 0xa0]
    0x00001ae0      70d03bd5       mrs x16, tpidrro_el0
    0x00001ae4      101e40f9       ldr x16, [x16, 0x38]
    0x00001ae8      4a0110ca       eor x10, x10, x16
    0x00001aec      7e0110ca       eor x30, x11, x16
    0x00001af0      8c0110ca       eor x12, x12, x16
    0x00001af4      a97d9952       mov w9, 0xcbed
    0x00001af8      2c1dc1da       autdb x12, x9
    0x00001afc      9f0140f9       ldr xzr, [x12]
    0x00001b00      9f010091       mov sp, x12
    0x00001b04      ea1fc1da       autdb x10, sp
    0x00001b08      fd030aaa       mov x29, x10
    0x00001b0c      3f000071       cmp w1, 0
    0x00001b10      20149f1a       csinc w0, w1, wzr, ne
    0x00001b14      ff0f5fd6       retab
    

    This does not seem to be open source at all at this point. And if I had to guess, it's probably also not stable ABI. The entire arm64e sub-architecture is considered not stable, subject to change without notice, and on macOS requires the -arm64e_preview_abi kernel boot-arg (which in turn requires downgraded OS security) in order for you to even be allowed to run non-Apple-signed arm64e binaries. So yeah, just don't rely on this code staying the same.

    But alright, lr is different due to pointer authentication. setjmp does a pacibsp, and then longjmp does the corresponding retab. So far so good, except fp and sp also have their pacdb and autdb, which should add pointer authentication just the same. But here's another little detail: if you're running an arm64 binary on arm64e capable-hardware, then you will have arm64e libraries loaded in your process, but you're still running as just arm64. The pointer authentication instructions are turned off for your process via hardware flags in SCTLR_EL1, except for the IB keys. So the pacib* family of instructions will work, the pacia*, pacda* and pacdb* ones will not. But when reading the fp/lr/sp values yourself, basically you'll want to strip the pointer authentication bits, and let the hardware decide whether that's a nop or not.

    So here's some code that prints these three registers on arm64, on macOS 13.4.1, no guarantees on forwards or backwards compatibility:

    #include <setjmp.h>
    #include <stdint.h>
    #include <stdio.h>
    
    /* ---------- imported from xnu/libsyscall/tsd ---------- */
    
    #define __TSD_PTR_MUNGE 7
    
    __attribute__((always_inline, const)) static __inline__ void** _os_tsd_get_base(void)
    {
    #if defined(__arm__)
        uintptr_t tsd;
        __asm__("mrc p15, 0, %0, c13, c0, 3\n"
                    "bic %0, %0, #0x3\n" : "=r" (tsd));
        /* lower 2-bits contain CPU number */
    #elif defined(__arm64__)
        /*
         * <rdar://73762648> Do not use __builtin_arm_rsr64("TPIDRRO_EL0")
         * so that the "const" attribute takes effect and repeated use
         * is coalesced properly.
         */
        uint64_t tsd;
        __asm__ ("mrs %0, TPIDRRO_EL0" : "=r" (tsd));
    #endif
    
        return (void**)(uintptr_t)tsd;
    }
    
    __attribute__((always_inline)) static __inline__ void* _os_tsd_get_direct(unsigned long slot)
    {
        return _os_tsd_get_base()[slot];
    }
    
    __attribute__((always_inline, const)) static __inline__ uintptr_t _os_ptr_munge_token(void)
    {
        return (uintptr_t)_os_tsd_get_direct(__TSD_PTR_MUNGE);
    }
    
    /* ---------- end of import ---------- */
    
    static inline uint64_t xpaci(uint64_t val)
    {
        __asm__ volatile
        (
            ".arch v8.3a\n"
            "xpaci %0\n"
            : "+r" (val)
        );
        return val;
    }
    
    static inline uint64_t xpacd(uint64_t val)
    {
        __asm__ volatile
        (
            ".arch v8.3a\n"
            "xpacd %0\n"
            : "+r" (val)
        );
        return val;
    }
    
    int main(void)
    {
        jmp_buf reg;
        setjmp(reg);
    
        uintptr_t token = _os_ptr_munge_token();
        uint64_t *u64 = (uint64_t*)reg;
        uint64_t fp = u64[10] ^ token,
                 lr = u64[11] ^ token,
                 sp = u64[12] ^ token;
    
        printf("fp: 0x%llx\n", xpacd(fp));
        printf("lr: 0x%llx\n", xpaci(lr));
        printf("sp: 0x%llx\n", xpacd(sp));
    
        return 0;
    }
    

    Note 1: normally the instructions xpaci and xpacd would not be accepted by the assembler if targeting arm64, but with the .arch v8.3a directive, we can convince it to let us pass.

    Note 2: this code will only work on ARMv8.3 hardware, due to the xpaci and xpacd instructions. If your code might run on hardware that doesn't support PAC (such as iOS devices before A12), then you'll need to change those. For xpaci it's possible to construct a backwards-compatible variant with xpaclri, which is encoded in historical NOP instruction space (see another answer of mine), but for xpacd there is no equivalent in NOP space, so what you'd have to do there is first determine what kind of hardware you're running on (via things like xpaclri), and then conditionally call into pointer authentication gadgets. But that's out of scope for this answer. :)