compiler-construction interpreter crafting-interpreters

How compile time local array index is same as runtime time stack index

I am reading from Book Crafting Interpreters by Bob Nystrom. I am on chapter 22-Local Variable in which under the section Using Locals he said

"At runtime, we load and store locals using the stack slot index, so that’s what the compiler needs to calculate after it resolves the variable. Whenever a variable is declared, we append it to the locals array in Compiler. That means the first local variable is at index zero, the next one is at index one, and so on. In other words, the locals array in the compiler has the exact same layout as the VM’s stack will have at runtime. The variable’s index in the locals array is the same as its stack slot. How convenient!"

But how is this possible, because local array during compile time store each and every declaration in every sub-blocks, but at runtime it may happen that much of those variables are popped off so indexes will not be same. Let's take a below example. I have a code in some programming language like this

var x = 10
var y = 20
{
   var z = 30
}
var a = 40
a = 50

During compile time my local array would look like [(10, 0), (20, 0), (30, 1), (40, 0)] where the tuple second entry is depth of declaration. So when I will generate instruction for last assignment, it will have something like STORE 3 as a is at index-3 in the above array. However at runtime my stack would look like (last element is top of stack) [10, 20, 40] because z is already popped off, so in reality index of a in stack is 2! How are the compile time local indexes and runtime stack indexes are same ?

I am really stuck on this point for long and he is using this fact in all further chapters.

Solution

Variable scope and lifetime are two important concepts which are too often conflated. The scope of a variable is the region of program text in which the variable's name is visible; that is, it can be resolved to the current binding of the variable. The lifetime of a variable is the interval during program execution in which the variable's binding exists. These two concepts are not even in the same category, as the above summary indicates; scope is an attribute of an identifier in program text, whereas lifetime is an attribute of a binding in the execution of the program. (I use the word "binding" to distinguish it from "object", since "object" is often used as a synonym for "value". Lifetime refers to the place where the value is stored. Unless the binding is immutable, various values can be stored in the binding during the lifetime; in dynamically-typed languages, these can be different values of different types.)

Some languages are designed to conflate the two concepts, but in many languages they really are separate. In C, for example, a local variable exists from the time the block in which it is declared is entered until the moment in which the function returns. Its scope, on the other hand, doesn't start until the identifier is actually declared. That's true in Python as well (although there's no mechanism for declaring a local variable in a simple block, only in comprehensions, lambdas and function bodies). And Javascript, whose variable semantics have evolved over the years, has a variety of different types of variables, each with different lifetime rules.

In particular, Nystrom's language attempts to completely conflate lifetime and scope. That's possible because his language has no goto statement or other looping construct which would allow execution to move backwards in a block, and because all of his declarations are block-level and not function-level. There's a lot to be said for this design, but it remains important to understand the difference between scope and lifetime; scope is a feature of the parser (or compiler) which concerns tokens (identifiers) which may not even have a runtime manifestation. Lifetime is a feature of the state of execution. So even if they have a strong correlation, they are aspects of different parts of the language implementation.

It can actually be more efficient to extend the lifetime of a local variable to the entire block, because it avoids pushing and popping variable slots one at a time. In a common C implementation strategy, the stack is extended (with a single increment operation) at function entry to the size needed for all locals and temporaries used in the function body (excluding VLAs). When the function is exited, all of these stack slots can be removed with a simple decrement operation. Lua uses roughly the same strategy. But I believe Nystrom's implementation goes for a simpler architecture in which the stack is extended by one slot when the variable is declared, and then popped for each local variable when the block is exited. So in his language, lifetime is perfectly correlated with scope. However, variable reference still uses the known offset of the variable on the stack, which means that the parser and the runtime code need to agree on the sequence of pushes and pops.

During the lifetime of a variable binding, its stack slot is definitely allocated to it. But that doesn't mean that that stack slot is allocated to the local variable for the entire duration of the function execution. It's quite possible for two variables to use the same stack slot, as long as their lifetimes are disjoint. And that's the case with the example code you provide:

var x = 10          # PUSH    # Stack: [x]
var y = 20          # PUSH    # Stack: [x] [y]
{
   var z = 30       # PUSH    # Stack: [x] [y] [z]
}                   # POP     # Stack: [x] [y]
var a = 40          # PUSH    # Stack: [x] [y] [a]
a = 50

So z and a, whose lifetimes are disjoint, share the use of stack slot 2.

As mentioned previously, this is only possible because there is no way for flow of control to back up after a is declared. If the above code were part of a looping construct, such as a while statement, repetition of the loop would create a new block; the a in each loop execution is a different binding from any previous a, although they are at the same stack offset. In any event, the use of a previous to its declaration would be a reference to some a in an outer scope (not shown here) or a global variable. So even if a were used in the block which defines z, there is no conflict.

In order for the correct VM code to be emitted, the parser needs to decrement its view of the size of the stack when it notices that a block is finished. That's also explained by Nystrom.