compiler-construction virtual-machine programming-languages calling-convention

How to resolve the conundrum of calling a function in a custom programming language?

I have been thinking for many months on and off how to solve the problem of calling a function in a custom programming language. There is this strange thing of an infinite set of recursive calls to the same function that I am having difficulty getting mentally beyond.

I'll illustrate like this. Say you are calling some function like doFoo(1, 2), and you now have to implement this. What happens (in my mind) is, you first push the variables onto the stack, and then do your jump to the function. But let's focus on this first step, pushing the variables onto the stack.

What you do is create a stack frame, and push that onto the stack. But in order to create a stack frame, you need to allocate space in memory. So to create the stack frame, you first call your custom allocateMemory(size) sort of function. Now this function needs to push its variables onto the stack, creating a stack frame.... So you allocate space on the stack for this stack frame by calling allocateMemory(size) from within the original allocateMemory(size) function. But then it happens again! And again and again and again. Everything needs to allocate memory, but the memory allocation requires pushing onto the stack, which requires memory allocation, which requires pushing onto the stack.... Etc.

So the way I just thought of to potentially resolve this is by thinking about this operation of "pushing onto the stack" as an atomic primitive operation. Essentially, from the "higher up" level of abstraction, pushing onto the stack is one step only. Like I imagine with assembly creating the stack frame under the hood however it does, it is implemented in the hardware (I think?), so you just do push <value> and the rest is abstracted away in a lower level system.

So thinking like that sort of solves the problem, but not quite.

For the purposes of this question, I am building a custom language interpreter to run in the browser. Essentially it will work like a VM interpreting byte code. Say we have bytecode that has the equivalent command of push <value>, which creates a stack frame (somehow). The question is, where/how do I implement that implementation of the push command? Below the VM interpreter? That would mean I have to write the allocation logic and creation of the stack frame in JavaScript land, and then the custom language would run in VM land, calling into JavaScript land to allocate stuff on the stack.

But I don't really want to do that. I want to write everything in this custom language! So how can I accomplish that? It is as if I need to create layers of VMs. One VM is running which creates the stack frames and handles commands from a higher level VM. The lower-level VM is implemented using very low-level primitives, smaller than "push onto the stack". It is using basically store and fetch and call and that's it.

Basically I am getting lost here. How do I handle this situation or think about it to get past the mental block? Is there a way to avoid having to create these sorts of layers? Any better way of conceptualizing this?

Solution

I think what I am going to do is inspired from coroutines and something I saw saying to have a secondary stack.

Basically, a "push to the stack" operation can be done in a single step if the operation is only shifting the position of a stack pointer (increment it by the size of the activation record). Then it is createStackFrame(size), and all it does is move a pointer. That could be implemented in the lower-level VM. So there is no need to "allocate memory" when creating a stack frame. You allocate enough memory for a large stack (like 8MB, for example), all up front. Then you can avoid doing allocation at runtime.

But then it gets a little tricky. I read about coroutines that for segmented stacks they have small stack "segments" that are like 4096 bytes in size (a page roughly), and they are linked together into a linked list. So building off of that idea, let's say that a single thread/coroutine/fiber/callback is executed using its own stack, divided into segments so it can grow. I know there's a thrashing problem with this, but I'm not that far yet. But we can handle multiple coroutines/fibers/threads, each with their own segmented stack.

The way to do that is by introducing a second stack, one just for the memory allocation process! In order to implement dynamically creating stacks for coroutines/threads/fibers, you need to allocate a new stack (say 4096 bytes). That allocation algorithm could be a complicated set of function calls (like implementing malloc), much more than the one-step createStackFrame(size) we decided on already. The createStackFrame(size) one is only one step because we have pre-allocated the stack (our current stack for our current fiber, let's say). So it just requires changing a pointer position. But our malloc for allocating a new stack, call it createStack(), requires doing a bunch of stuff potentially. To handle that, and not run into the recursive conundrum outlined in the original question, we have a second stack, one for just the createStack() or malloc implementation!

So then our call to createStackFrame(size) can be preceded by a check to see if we are about to run out of stack segment space, and if so, we switch the processor to use the "allocation stack", and then run the allocation algorithm using that stack (which is fixed and won't grow, if we keep our algorithm simple enough). Once it allocates some space, which toggle back to the last thread/fiber/coroutine stack, and link the new memory allocation space to it. But this toggling between allocation stack and regular coroutine stack will make it so you don't run into the recursive problem in this post. The allocation stack won't need to itself reallocate, and so can use the atomic createStackFrame(size) only!