Search code examples
compiler-constructioninterpreter

How to track the index of variables in an interpreter


I'm creating an interpreter (a bytecode interpreter, so also a compiler) and I've found a problem I just can't solve. I need to store variables somewhere. Storing them in a dictionary and looking them up at runtime would be way to slow, so I'd like to store them in registers and use their indexes instead of their name.

So at compile time I give every variable an index, and create an array of registers. That's fine for monolithic scoped languages. But the language I'm creating the interpreter for has nested scopes (and function calls). So another approach could be that I have a set of global registers, and a stack of register lists for function calls. So my virtual machine would have something like:

Register globalRegisters[NUMBER_OF_GLOBALS];
Stack<Register[]> callstack;

But there's another thing. My language allows functions inside functions. Example:

var x = 1;
function foo() {
    y = 2;
    function bar() {
        z = 3;
        y = y - 1;
    }
}

Function bar() refers to a variable that belongs to foo(). So that means that the virtual machine would have to look at the register list under the top one on the stack. But what if bar() is recursive? What if the number of recursions are defined by user input? Then the virtual machine just wouldn't know how many stack elements does it have to go under to find the set of registers containing the value of y.

What could be an effective solution for this problem? This is the first time I'm dealing with registers, calculations happen on a value stack.


Solution

  • The usual way to represent closures is to create a struct that contains both the function pointer itself as well as its environment. The simplest way to represent the environment would be as a pointer to the stack frame of the outer function. The inner function could then simply dereference that pointer (with the offset of the given variable of course) when accessing variables of the outer function.

    However there's another problem you have to consider: What if foo returns bar and then bar is called after foo already returned? In this case a pointer to foo's stack frame would be invalid as that stack frame would no longer exist at that point. So if you want to allow that scenario (instead of simply making it illegal to return functions), you need another solution.

    One common solution would be to represent all local variables as pointers to heap-allocated values and storing copies of all those pointers in the function's struct.

    Another solution would be to restrict closures, so that variables of the outer function can only be accessed by the inner function if they're never re-assigned. That's what closures in Java 8 do. Then the struct could simply contain copies of the variables instead of pointers.