Search code examples
javascriptdata-structuresscopeprogramming-languagesabstract-syntax-tree

How is "scope" or "context" stored and referenced in a compiled program?


Sorry if I confuse the two terms "scope" and "context", but basically I am referring to the lexical scope I think, but the instance of it when a function (or class body) is evaluating.

I am working on implementing a strange tree-based programming language and have different structures which need a sort of "scope". A similar situation is in 2 languages which I am familiar, JavaScript and less-so Ruby. In Ruby, you have executable class bodies, so they have their own scope, but then you also have executable functions with their own scope. Then inside of blocks, they each have their own scope. The tree of "lexical" scopes is basically the tree of variables/stuff you can reference in the visible code in the parent tree (the visible scope is the lexical scope).

Likewise, I would like to implement this for modules, functions, and some other things like my views/components (which I don't want to be functions like React, similarly modules in my lang are not evaluated functions). In JavaScript you just have functions which get called and the scope is stored in each function somehow.

But what I'm wondering is, what is the data model / data structure / or generally the "structure" of how the scope is stored and referenced? Literally in the AST or wherever, how does it work?

What I am imagining doing is having a module be wrapped in actually a "tree" object, like:

{
  type: 'tree',
  scope: {
    foo: 'bar',
  },
  object: module
}

Here the tree object is some sort of AST wrapper object which has a reference to the scope used on the object, and the object itself. This is to prevent the object from being polluted with an extra scope property, which it doesn't have: a module does not have a scope property in the general sense. The resolution of the module has a scope, so maybe the tree object is actually called the resolution object.

{
  type: 'resolution',
  scope: {
    foo: 'bar',
  },
  object: module
}

But then it starts to get hairy. Within the module are classes, which have a class scope and an instance scope. Or functions which have an instance scope. Or other trees of scope in different custom objects I'm working with.

So what I'm imagining doing is calling this resolution object a "fork", and essentially building a data tree. Each fork object has a scope property on it, and every object is actually wrapped in a fork.

const moduleFork = {
  type: 'fork',
  scope: {
    foo: 'bar',
  },
  children: {
    // this is a module now.
    functions: {
      type: 'fork',
      children: {}
    }
  }
}

Then inside the functions object, we have a function example:

// the functions scope is the module itself.
moduleFork.children.functions.scope = moduleFork.children

// then a function func1
moduleFork.children.functions.func1 = {
  // this is inside the function now.
  type: 'fork',
  scope: moduleFork.children,
  children: {
    params: {
      type: 'fork',
      list: true,
      children: {
        param1: {
          type: 'fork',
          children: {
            name: {
              type: 'string',
              value: 'param1'
            },
            default: {
              type: 'string',
              value: 'hello world'
            }
          }
        }
      }
    }
  }
}

I dunno, something like that, I'm still working on it. But as such, every single object in the AST has a reference to a scope it can use to fetch variable values. I omitted some of the repetition, but in reality it would look like this more or less:

// then a function func1
{
  // this is inside the function now.
  type: 'fork',
  scope: moduleFork.children,
  children: {
    params: {
      type: 'fork',
      list: true,
      scope: moduleFork.children,
      children: {
        param1: {
          type: 'fork',
          scope: moduleFork.children,
          children: {
            name: {
              type: 'string',
              value: 'param1'
            },
            default: {
              type: 'string',
              value: 'hello world'
            }
          }
        }
      }
    }
  }
}

In that way, you could do getFromTree(astNode.scope, 'foo') and it would be straightforward. Likewise, since we are in the context of a module, we can do getFromTree(paramNode.scope, 'functions') and it would get the functions array from the module (deserializing from the fork wrapper sort of thing).

In my language, some AST nodes will be like { type: 'reference', path: ['foo'] }, and in this case, it is important for this object to know the scope so it can resolve foo. So having the "wrapping fork" concept seems like it will make it easy to pass the scope around, especially once you have events and data binding sorts of things. Rather than having actual nested bound function scopes like in JavaScript, you are just dealing with AST tree objects.

The key question is, is this how it is done in other programming language implementations? For example, when v8 compiles JavaScript, does every AST object have a reference to its scope object? As such, is each AST object sort of wrapped in a "scopeResolver" object sort of thing, like I am trying to do here?

If you could, please paint a basic picture of the data structure of how scope works, and if it is referenced in every object like I am trying to do here.


Solution

  • V8 represents scopes through a class named Scope which is referenced in the Block AST Node, as well as in all other nodes that introduce their own scope, such as With, TryCatch or FunctionExpression.

    For example, when v8 compiles JavaScript, does every AST object have a reference to its scope object? As such, is each AST object sort of wrapped in a "scopeResolver" object sort of thing, like I am trying to do here?

    No and no. During interpretation and compilation the AST tree is always traversed depth-first, thus when visiting a node that reads or writes to a variable the parent blocks were already traversed. Thus it seems way easier to store the parent blocks in some sort of stack during traversal instead of adding another reference to each node in the AST. Especially with manual memory management this seems like a large overhead with no benefit.

    As far as I can see the V8 interpreter keeps the scopes inside the ContextScope linked list.