I am currently trying to JIT via python. I found peachpy via another SO question. For most part this is easy, but I am failing to use external c-functions. I want to call putchar, so a function with a single argument. Since I am on windows, with x86-64, I expect the single argument to be put into rcx
, and then running call
with the function-pointer address. For this I wrote this code:
from peachpy import *
from peachpy.x86_64 import *
import ctypes
putchar_address = ctypes.addressof(ctypes.cdll.msvcrt.putchar)
c = Argument(uint64_t)
with Function("p", (c,), int64_t) as asm_function:
LOAD.ARGUMENT(rcx, c)
MOV(r8, putchar_address)
CALL(r8)
RETURN(rax)
raw = asm_function.finalize(abi.detect()).encode()
python_function = raw.load()
print(python_function(48))
This crashes with OSError: exception: access violation writing 0x0000029E58C1A978
on the final code.
I looked through lots of other SO answers, but none really help to solve this problem, and the code is actually the result of these. The most useful was this one: Handling calls to (potentially) far away ahead-of-time compiled functions from JITed code
Edit: A few more things I tried.
PeachPy does specifically not expose rsp
directly, claiming that it already deals with it correctly. But I can still influence it directly, leading to this code:
from peachpy.x86_64.registers import rsp
#...
LOAD.ARGUMENT(rcx, c)
SUB(rsp, 40)
MOV(r8, putchar_address)
CALL(r8)
ADD(rsp, 40)
RETURN(rax)
This changes the error to a crash with exit code 0xC0000409
, meaning stack access beyond top of stack.
Here are the disassemble result of what PeaachPy generates:
Without rsp
0: 49 b8 a8 a8 1a 84 1f movabs r8,0x21f841aa8a8
7: 02 00 00
a: 41 ff d0 call r8
d: c3 ret
With rsp
0: 48 83 ec 28 sub rsp,0x28
4: 49 b8 a8 98 ad 9e ac movabs r8,0x1ac9ead98a8
b: 01 00 00
e: 41 ff d0 call r8
11: 48 83 c4 28 add rsp,0x28
15: c3 ret
(From https://defuse.ca/online-x86-assembler.htm)
Based on the output of the c compiler (here: https://godbolt.org/z/BKgk7Y), I created the following code
MOV([rsp + 16], rdx)
MOV([rsp + 8], rcx)
SUB(rsp, 40)
MOV(rcx, [rsp + 56])
CALL([rsp + 48])
ADD(rsp, 40)
RETURN(rax)
which creates the same assembler code as the c compiler:
0: 48 89 54 24 10 mov QWORD PTR [rsp+0x10],rdx
5: 48 89 4c 24 08 mov QWORD PTR [rsp+0x8],rcx
a: 48 83 ec 28 sub rsp,0x28
e: 48 8b 4c 24 38 mov rcx,QWORD PTR [rsp+0x38]
13: ff 54 24 30 call QWORD PTR [rsp+0x30]
17: 48 83 c4 28 add rsp,0x28
1b: c3 ret
This fails, meaning the problem is not in the generated code. (And I didn't use putchar, and I still get the same exit code 0xC0000409)
With the help of @PeterCordes I figured out the important problems.
ctypes.addressof(ctypes.cdll.msvcrt.putchar)
gives not the start of the code, but the address of a pointer to the start of the code.Problem 1 is easy to solve, and Problem 2 needed a bit of tinkering. In the end, this code works:
c_void_p_p = ctypes.POINTER(ctypes.c_void_p)
putchar_address = ctypes.addressof(ctypes.cast(ctypes.cdll.msvcrt.putchar, c_void_p_p).contents)
func_ptr = Argument(ptr())
c = Argument(uint64_t)
with Function("p", (c,), int64_t) as asm_function:
MOV(r12, putchar_address)
SUB(rsp, 40)
CALL(r12)
ADD(rsp, 40)
RETURN()
raw = asm_function.finalize(abi.detect()).encode()
print(raw.code_section.content.hex())
python_function = raw.load()
print(python_function(54))
This generates this assembly:
0: 41 54 push r12
2: 49 bc 90 77 75 4d fa movabs r12,0x7ffa4d757790
9: 7f 00 00
c: 48 83 ec 28 sub rsp,0x28
10: 41 ff d4 call r12
13: 48 83 c4 28 add rsp,0x28
17: 41 5c pop r12
19: c3 ret
And works exactly as expected.
(Just remember which registers are saved/need to be saved.)