ELF - Entry point patching with x86 zero-extended address

I've managed patch the entry point of an ELF file and made it point to some place else and execute a piece of code before returning to the original entry point. The following is how I'm trying to jump back to the OEP:

mov rax, 0x4141414141414141  ( 48 b8 41 41 41 41 41 41 41 41 )
jmp rax                      (ff e0)

I have an array with these opcodes which I patch as soon as I parse the ELF header to get the entry point:

uint64_t oep = ehdr->e_entry;
memcpy(&opcode[23], &oep, 8);

But the entry point is always something like: 0x47fe8d which invalidates the rest of the array since the opcode is expecting an 8 byte address without zeros. I tried to replace it by sign extending the address like: 0xffffffff47fe8d but it didn't work. This appears to be normal behavior since x86 addresses are zero-extended.

EDIT: The shellcode array looks like this:

 _start:
       xor rax, rax
       xor rax, rax
       xor rsi, rsi
       jmp get_str
 shellcode:
       pop rsi
       mov al, 1
       mov dil, 1
       mov dl, 9
       syscall ; writes a string

       mov rax, 0x4141414141414141 ; patched with the EP
       jmp rax
   get_str:
         call shellcode
         db "strings!", 0xa

 // write syscall + jmp OEP (mov rax, addr, jmp rax). patch at 23
unsigned char shellcode[] = "\x48\x31\xc0\x48\x31\xff\x48\x31\xf6\xeb"
                  "\x16\x5e\xb0\x01\x40\xb7\x01\xb2\x09\x0f"
                  "\x05\x48\xb8\x41\x41\x41\x41\x41\x41\x41"
                  "\xff\xe0\xe8\xe5\xff\xff\xff\x68\x69\x6a"
                  "\x61\x63\x6b\x65\x64\x0a";

I made a function which prints this array before patching it. Here's what it looks like:

\x48\x31\xc0\x48\x31\xff\x48\x31\xf6\xeb\x16\x5e\xb0\x01\x40\xb7\x01\xb2\x09\x0f\x05\x48\xb8\x41\x41\x41\x41\x41\x41\x41\xff\xe0\xe8\xe5\xff\xff\xff\x68\x69\x6a\x61\x63\x6b\x65\x64\x0a

But after patching the jmp instruction with 0x47fe8d the higher bytes of the address become zero:

\x48\x31\xc0\x48\x31\xff\x48\x31\xf6\xeb\x16\x5e\xb0\x01\x40\xb7\x01\xb2\x09\x0f\x05\x48\xb8\x20\x1b\x40

And this for some reason causes a segmentation fault. I used IDA to search for the entry point of the patched file and here's what I found:

LOAD:000000000047FE8D start:                                  ; DATA XREF: LOAD:0000000000400018↑o
LOAD:000000000047FE8D                 xor     rax, rax
LOAD:000000000047FE90                 xor     rdi, rdi
LOAD:000000000047FE93                 xor     rsi, rsi
LOAD:000000000047FE96
LOAD:000000000047FE96 loc_47FE96:                             ; CODE XREF: LOAD:000000000047FEAC↓j
LOAD:000000000047FE96                 jmp     short loc_47FEAE
LOAD:000000000047FE98 ; ---------------------------------------------------------------------------
LOAD:000000000047FE98                 pop     rsi
LOAD:000000000047FE99                 mov     al, 1
LOAD:000000000047FE9B                 mov     dil, 1
LOAD:000000000047FE9E                 mov     dl, 9
LOAD:000000000047FEA0                 syscall                 ; $!
LOAD:000000000047FEA2                 mov     rax, offset _start
LOAD:000000000047FEAC                 loopne  loc_47FE96
LOAD:000000000047FEAE
LOAD:000000000047FEAE loc_47FEAE:                             ; CODE XREF: LOAD:loc_47FE96↑j
LOAD:000000000047FEAE                 in      eax, 0FFh       ; $!
LOAD:000000000047FEAE ; ---------------------------------------------------------------------------
LOAD:000000000047FEB0                 dq 6B63616A6968FFFFh
LOAD:000000000047FEB8                 db 65h, 64h, 0Ah
LOAD:000000000047FEB8 LOAD            ends

So, despite IDA wrongly encoding the instruction at 000000000047FEAC it appears that the file has been successfully patched, the _start symbol leads to the following path:

public _start
_start proc near
endbr64
xor     ebp, ebp
mov     r9, rdx         ; rtld_fini
pop     rsi             ; argc
mov     rdx, rsp        ; ubp_av
and     rsp, 0FFFFFFFFFFFFFFF0h
push    rax
push    rsp             ; stack_end
mov     r8, offset __libc_csu_fini ; fini
mov     rcx, offset __libc_csu_init ; init
mov     rdi, offset main ; main
db      67h
call    __libc_start_main
hlt
_start endp

This ends up calling the original main function, everything seems to be in order.

Upon further examination I found out that the instruction at 000000000047FEAE is the culprit, although I don't really understand why. This is the call instruction I used to push the address of the string onto the stack.

Why am I getting a Segmentation fault?

Solution

IDA isn't decoding it wrong, your hex string version of your machine code is wrong; one \x41 byte short so mov r64, imm64 consumes the following FF byte as part of its immediate, instead of the opcode for jmp. That's why it decodes at 0e e8 loopne`.

I noticed this by copy/pasting your C array into a .c and compiling that into a .o. Then I disassembled it with objdump -D -rwC -Mintel foo.o to get objdump to disassemble the .data section. It agrees with IDA, proving IDA was right and you did make a mistake in whatever you did to translate your NASM output into a hex string. (IDK why you're bothering to do that, instead of just linking with the NASM .o output to test it the normal way first, or what it has to do with modifying an ELF binary.)

 // write syscall + jmp OEP (mov rax, addr, jmp rax). patch at 23
unsigned char shellcode[] = "\x48\x31\xc0\x48\x31\xff\x48\x31\xf6\xeb"
                  "\x16\x5e\xb0\x01\x40\xb7\x01\xb2\x09\x0f"
                  "\x05\x48\xb8\x41\x41\x41\x41\x41\x41\x41"  // this is only 7 x41 bytes
                  "\xff\xe0\xe8\xe5\xff\xff\xff\x68\x69\x6a"
                  "\x61\x63\x6b\x65\x64\x0a";

objdump -D shows 48 b8 41 41 41 41 41 41 41 ff movabs rax,0xff41414141414141 - the most significant byte of your mov imm64 is the FF that's supposed to be the jmp opcode. Your C string only has 7 \x41 bytes.

You should also see the same thing if you disassemble within GDB on the instruction that faulted; it's probably the in instruction which is privileged.

Creating values that contain `0` in registers with shellcode

This part is easy. XOR or ADD some constant like -1 or 0x80 that makes every byte non-zero, then NOT, xor-immediate, or sub-immediate. Or pad with low garbage and right shift.

e.g. to create 3-byte 0x47fe8d in a register, you can do

   mov eax, 0x47fe8d61       ; (0x47fe8d << 8) + 'a'
   shr eax, 8

Writing a 32-bit register implicitly zero-extends to 64 bits, so this leaves
RAX = 0 0 0 0 0 47 fe 8d = 0x47fe8d.

    mov eax, ~0x47fe8d          ; none of the bytes are FF -> none of ~x are 0
    not eax                     ; still leaving the upper 32 bits zeroed

ELF - Entry point patching with x86 zero-extended address

Creating values that contain 0 in registers with shellcode

Creating values that contain `0` in registers with shellcode