Search code examples
c++assemblyx86-64jit

Where to store code constants when writing a JIT compiler?


I am writing a JIT compiler for x86-64 and I have a question regarding best practice for inclusion of constants into the machine code I am generating.

My approach thus far is straightforward:

  • Allocate a chunk of RW memory with VirtualAlloc or mmap
  • Load the machine code into said memory region.
  • Mark the page executable with VirtualProtect or mprotect (and remove the write privilege for security).
  • Execute.

When I am generating the code, I have to include constants (numerical, strings) and I am not sure what is the best way to go about it. I have several approaches in mind:

  • Store all constants as immediate values into instructions' opcodes. This seems like a bad idea for everything except maybe small scalar values.
  • Allocate a separate memory region for constants. This seems to me like the best idea but it complicates memory management slightly and compilation workflow - I have to know the memory location before I can start writing the executable code. Also I am not sure if this affects performance somehow due to worse memory locality.
  • Store the constants in the same region as the code and access it with RIP-relative addressing. I like this approach since it keeps relevant parts of the program together but I feel slightly uneasy about mixing instructions and data.
  • Something completely different?

What is the preferable way to go about this?


Solution

  • A lot depends on how you are generating your binary code. If you use a JIT assembler that handles labels and figuring out offsets, things are pretty easy. You can stick the constants in a block after the end of the code, using pc-relative references to those labels and end up with a single block of bytes with both the code and the constants (easy management). If you're trying to generate binary code on the fly, you already have the problem of figuring out how to handle forward pc-relative references (eg for forward branches). If you use back-patching, you need to extend that to support references to your constants block.

    You can avoid the pc-relative offset calculations by putting the constants in a separate block and passing the address of that block as a parameter to your code. This is pretty much the "Allocate a separate region for constants" you propose. You don't need to know the address of the block if you pass it in as an argument.