Search code examples
bytecodeinterpreterjitcil

How do JIT interpreters handle variable names?


Let's say I am to design a JIT interpreter that translates IL or bytecode to executable instructions at runtime. Every time a variable name is encountered in the code, the JIT interpreter has to translate that into the respective memory address, right?

What technique do JIT interpreters use in order to resolve variable references in a performant enough manner? Do they use hashing, are the variables compiled to addresses ahead of time, or am I missing something altogether?


Solution

  • There is a huge variety of possible answers to this question, just as there are a huge variety of answers to how to design a JIT in general.

    But to take one example, consider the JVM. Java bytecode actually does not contain variable names at all, except for debugging/reflection metadata. Instead, the compiler assigns each variable an "index" from 0 to 65535 and bytecode instructions use that index. However, the VM is free to make further optimizations if it wants to. For example, it may convert everything into SSA form and then compile it into machine code, in which case variables will end up being turned into machine-registers or fixed offsets in the stack frame or optimized away entirely.

    Consider another example: CPython. Python actually maintains variable names at runtime, due to its high level, flexible nature. However, the interperter still performs a few optimizations. For example, classes with a __slots__ attribute will allocate a fixed size array for the fields, and use a name -> index hashmap for dynamic lookups. I am not familiar with the implementation, but I think it does something similar with local variables. Note that normal local variable accesses (not using reflection), can be converted to a fixed offset at "compile" time.

    So in short, the answer to

    Do they use hashing, are the variables compiled to addresses ahead of time, or am I missing something altogether?

    is yes.