I'm trying to assemble and call code at runtime in an interpreter project written in Rust. I'm using the assembler
crate for this. I'd like to have extern "C" fn
wrappers around important runtime functionality that JIT'ed code can call directly. To minimise complexity in the generated code segments, I'd like to keep the interpreter state as a global variable.
However, the program keeps crashing whenever I try to access and/or modify any global state. A simple println!("hello world")
crashes with a general protection fault, seemingly when stdio::_print()
accesses stdout
. A simpler example with a static mut
variable dumps core as well. Interestingly, while the former crashes under Valgrind with a neat stack trace, the latter example runs without fail, actually passing the assertions that hint that the program works as expected. Note that the static variable does not need to be mutable and reading it is enough to cause a crash.
I'm quite ignorant of the mmap
details the assembler
crate uses underneath, but I wasn't able to find any hint of why my approach crashes. Any guidance would be greatly appreciated.
I've created a Repl.it repl with an MRE.
use anyhow::Result;
use assembler::*;
use assembler::mnemonic_parameter_types::{registers::*, immediates::*};
const CHUNK_LENGTH: usize = 4096;
const LABEL_COUNT: usize = 64;
static mut X: u32 = 0;
#[no_mangle]
unsafe extern "C" fn foo() {
// printing here will lead to a coredump,
// Valgrind will provide more insight (general protection fault)
// println!("hello world");
// modifying a global variable instead will also dump
// core but will run without fail in Valgrind
X += 1
}
fn main() -> Result<()> {
let mut memory_map = ExecutableAnonymousMemoryMap::new(CHUNK_LENGTH, true, true)?;
let mut instr_stream = memory_map.instruction_stream(&InstructionStreamHints {
number_of_labels: LABEL_COUNT,
..Default::default()
});
let f = instr_stream.nullary_function_pointer::<i64>();
instr_stream.call_function(foo as unsafe extern "C" fn());
instr_stream.mov_Register64Bit_Immediate64Bit(Register64Bit::RAX, Immediate64Bit(0x123456789abcdef0));
instr_stream.ret();
instr_stream.finish();
assert_eq!(unsafe { f() }, 0x123456789abcdef0);
assert_eq!(unsafe { X }, 1);
Ok(())
}
Here are the Valgrind outputs for both cases.
Global variable:
==4186== Memcheck, a memory error detector
==4186== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==4186== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==4186== Command: target/debug/jit-ffi-fault-mre
==4186==
==4186==
==4186== HEAP SUMMARY:
==4186== in use at exit: 0 bytes in 0 blocks
==4186== total heap usage: 14 allocs, 14 frees, 199,277 bytes allocated
==4186==
==4186== All heap blocks were freed -- no leaks are possible
==4186==
==4186== For lists of detected and suppressed errors, rerun with: -s
==4186== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
println!
:
==4341== Memcheck, a memory error detector
==4341== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==4341== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==4341== Command: target/debug/jit-ffi-fault-mre
==4341==
==4341==
==4341== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==4341== General Protection Fault
==4341== at 0x13FB7A: std::io::stdio::_print (stdio.rs:1028)
==4341== by 0x1174B0: foo (main.rs:15)
==4341== by 0x4E58004: ???
==4341== by 0x11A1BC: jit_ffi_fault_mre::main (main.rs:35)
==4341== by 0x113FEA: core::ops::function::FnOnce::call_once (function.rs:248)
==4341== by 0x11497D: std::sys_common::backtrace::__rust_begin_short_backtrace (backtrace.rs:122)
==4341== by 0x114E70: std::rt::lang_start::{{closure}} (rt.rs:145)
==4341== by 0x13CA95: call_once<(), (dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (function.rs:280)
==4341== by 0x13CA95: do_call<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (panicking.rs:492)
==4341== by 0x13CA95: try<i32, &(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (panicking.rs:456)
==4341== by 0x13CA95: catch_unwind<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (panic.rs:137)
==4341== by 0x13CA95: {closure#2} (rt.rs:128)
==4341== by 0x13CA95: do_call<std::rt::lang_start_internal::{closure_env#2}, isize> (panicking.rs:492)
==4341== by 0x13CA95: try<isize, std::rt::lang_start_internal::{closure_env#2}> (panicking.rs:456)
==4341== by 0x13CA95: catch_unwind<std::rt::lang_start_internal::{closure_env#2}, isize> (panic.rs:137)
==4341== by 0x13CA95: std::rt::lang_start_internal (rt.rs:128)
==4341== by 0x114E3F: std::rt::lang_start (rt.rs:144)
==4341== by 0x11A37B: main (in /home/runner/UnsightlyAwfulPhases/jit-ffi-fault-mre/target/debug/jit-ffi-fault-mre)
==4341==
==4341== HEAP SUMMARY:
==4341== in use at exit: 85 bytes in 3 blocks
==4341== total heap usage: 14 allocs, 11 frees, 199,277 bytes allocated
==4341==
==4341== LEAK SUMMARY:
==4341== definitely lost: 0 bytes in 0 blocks
==4341== indirectly lost: 0 bytes in 0 blocks
==4341== possibly lost: 0 bytes in 0 blocks
==4341== still reachable: 85 bytes in 3 blocks
==4341== suppressed: 0 bytes in 0 blocks
==4341== Rerun with --leak-check=full to see details of leaked memory
==4341==
==4341== For lists of detected and suppressed errors, rerun with: -s
==4341== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
/tmp/nix-shell-4318-0/rc: line 1: 4341 Segmentation fault (core dumped) valgrind target/debug/jit-ffi-fault-mre
There are two issues that I know of with this code.
One comes from this comment in the documentation of call_function:
32-bit displacement sign extended to 64-bits in 64-bit mode.
WARNING: The location of emitted code may be such that if it is more than 2Gb away from common library function calls (eg printf); it may be preferrable to use an absolute address indirectly in this case, eg call_Register64Bit or call_Any64BitMemory.
That is a fancy way of saying that the argument of this call
is a 32-bit value relative to the current rip
.
Usually, code compiled together will be all quite near, and that is more than enough but with a Rust compiled function and a dynamically allocated anonymous map there is no guarantee that they will be less than 2GB apart.
For example, in my system, foo
is at address 0x559a8039f5a0
while the anonymous memory is at 0x40000000
. That is more than 87657 GB away!
The solution is to do as the documentation instructs and to a 64-bit absolute jump, for example using rax
.
The other problem is that in x86_64 ABI the stack must be aligned to 16 bytes. But doing a call
to a nullary function only pushes 8 bytes to the stack and it gets misaligned.
To fix this, functions need to somehow realign the stack. If the function has local automatic storage, it is done by reserving a number of bytes multiple of 16 plus 8. Functions that do not use automatic storage, such as yours, usually just do a random push
at the start and a corresponding pop
at the end.
The working code would be something like:
// push %rax
instr_stream.push_Register64Bit_r64(Register64Bit::RAX);
// movabs foo, %rax
instr_stream.mov_Register64Bit_Immediate64Bit(
Register64Bit::RAX,
Immediate64Bit(foo as i64)
);
// call *%rax
instr_stream.call_Register64Bit(Register64Bit::RAX);
// movabs 0x123456789abcdef0, %rax
instr_stream.mov_Register64Bit_Immediate64Bit(Register64Bit::RAX, Immediate64Bit(0x123456789abcdef0));
// pop %ecx
instr_stream.pop_Register64Bit_r64(Register64Bit::RCX);
// ret
instr_stream.ret();
instr_stream.finish();