How to resolve EXC_BAD_ACCESS(code=2) when access to mmap allocated excutable page? on Apple silicon(MacOS & M1 mac mini)

I have written a test program to implement something similar to self-modifying code on Apple Silicon.

int main() {
 
    uint8_t *instr;
    uint32_t instr1 = 0x8b000000; // add x0, x0, x0
    uint32_t instr2 = 0xd65f03c0; // ret
 
    if(pthread_jit_write_protect_supported_np() == 1)
        printf("jit write supported\n");
    else
        printf("jit write not supported\n");

    pthread_jit_write_protect_np(1);
    instr = (uint8_t*)mmap(NULL, 1024, PROT_READ|PROT_EXEC|PROT_WRITE, MAP_PRIVATE|MAP_ANON|MAP_JIT, 0, 0) + 4084;
    pthread_jit_write_protect_np(0);

    if(instr == MAP_FAILED){
        perror("mmap");
        exit(-1);
    }

    printf("instr addr : %lx\n", (uintptr_t)instr);
    memcpy(instr, &instr, 4);
    memcpy(instr+4, &instr2, 4);

    printf("instr1 is %x\n", *(uint32_t *)instr);
    printf("instr2 is %x\n", *(uint32_t *)(instr+4));

    asm volatile(
    "eor x0, x0, x0\n"
    "eor x1, x1, x1\n"
    "eor x2, x2, x2\n"
    "eor x3, x3, x3\n"
    );

    asm volatile(
    "ldr x1, %[ptr]\n"
    "br x1\n"
    ::[ptr]"m"(instr)
    );

    return 0;
}

I allocate a 4KB memory region with mmap, allowing read, write, and execute permissions. Then, I use memcpy to write the two assembly instructions to this memory region. After that, I initialize registers x1~x3 with inline assembly and branch the program counter (pc) to the previously allocated page. After branching, execute instructions instr1 and instr2 sequentially. However, when branching and accessing memory region, the program aborts with an EXC_BAD_ACCESS code=2 error.

Through Google searches, I've come to realize that the issue lies with Apple codesign. It seems that access is denied for code running in memory on Apple Silicon if it is not codesigned. So, I've been googling to find a way to allow access through codesign. However, I have been unable to find a way to codesign the memory allocated through mmap to allow access. Is there any way to resolve this issue?

Solution

The immediate issue is that you call pthread_jit_write_protect_np(0) and never flip it back to 1. This leaves your thread in a state where JIT pages are rw-, so trying to execute from there will fault. Call pthread_jit_write_protect_np(1) after the memcpy()s.

The next issue is that you use ldr x1, %[ptr]. This loads x1 from the pointer rather than moving the pointer into x1, so x1 will be the 8 bytes you wrote. Replace that with mov x1, %[ptr] and change "m" to "r".

Then there's caching. Import <libkern/OSCacheControl.h> and do a sys_dcache_flush(instr, 0x8) before the final call to pthread_jit_write_protect_np(1) and a sys_icache_invalidate(instr, 0x8) afterwards.

Then there's the issue that you're writing the wrong thing:

memcpy(instr, &instr, 4);

You meant to take the address of instr1 here, not instr. You're currently writing the low 4 bytes of the pointer.

Now your shellcode is being copied correctly and actually executes, but the ret is a problem. You call into it with br, so x30 is stale and points to wherever the last function returned to, which is most likely your last printf. Change br to blr.

And then - why do you map 0x400 bytes but then add 0xff4 to that pointer? In practice that is likely to not be a problem because pages under arm64 XNU are 16KiB, but just... why? You could at the very least not add anything to the pointer, then any size big enough to hold your instructions would be fine.

And then there's the missing register clobbers, which may just happen to not cause problems right now, but sooner or later that will interfere with code generated by the compiler.