Search code examples
c++virtual-machinejit

How can you replicate intermediate bytecode JIT compilation in C++?


Let's say I was creating my own C++ virtual machine that supports my own intermediate bytecode instruction based program code.

The compiled IL is then loaded into the VM and executed. How would one go about taking a list of instructions and then actually storing them as native instructions in memory?

By this, I mean: if I had an array of bytes that were all opcode instructions, and I looped through them sequentially and used a switch statement to handle execution, how could I actually virtually compile that to native 'just in time' and execute that each time that specific instruction set is being executed?

I'm really confused as to how this actually works in virtual machines? How can a virtual machine compile intermediate instructions into native and then store those native (asm?) instructions in memory to be executed each time the set of instructions are called?

I'm keen to understand the concept a bit further. Sorry if my understanding of low level VM design is lacking. I don't understand how you can compile the results of a switch statement, in my case anyway - not sure how VMs actually do it, into native, compiled, code.


Solution

  • A naive macro compilation of bytecode instructions into sequences of native instructions is very inefficient. JIT compilers would normally build an intermediate representation out of a bytecode first (an SSA form, ideally), and then apply all the same compilation techniques as for a standalone compiler, just avoiding the most expensive ones. Translating a stack machine bytecode into an SSA is a form of decompilation, so a register machine bytecode can be a bit more efficient (but harder to produce).

    There is also another approach which is gaining popularity at the moment - tracing JITs, with the highest tier of JIT pipeline doing pretty much the same thing, again, as a standalone compiler, but for a single basic block only (generated by tracing). In this approach, a naive macro expansion is acceptable as a first tier JIT.

    And, if you really want to be able to take your switch statement without any modifications, take some bytecode and translate them both into a compiled code, you should look at the partial evaluation technique (aka Supercompilation aka First Futamura Projection).