Search code examples
compiler-constructionbytecodeinterpreter

Multiple passes for name resolution and code-generation in interpreters


I am implementing my own small programming language which allows usage of functions and types (structs) even before they are declared. I have the parser giving out an AST. Now my doubt is:

  1. Name Resolution: apart from variables (which cannot be used before declaration) other items like functions and types, is it possible to name resolve all the identifiers in a single pass ? Currently what I am doing is I have first pass which walks the AST and catch declaration nodes and build up the scope table (with nested scopes etc.). Then I have a second pass in which I bind all the identifier to the respective declaration entries in scope table. Is there any better way of name resolution than two-pass ?

  2. ByteCode-Generation: My language has a VM and so I want to generate bytecode again by walking the AST. Again the same problem kicks in. As functions can be called before function declaration, when I pass through a function call for which there is no function object existing (as code-generator has not yet encountered the function declaration) I believe we again need two pass code-generation where in first pass we generate functions objects (and struct objects) by capturing declaration nodes and then in second pass we generate function call bytecode by using the function object and arguments (and pushing them on stack etc.). Again is there any better way of doing code-generation than two passes ?

NOTE: The language is statically-typed so name resolution and binding of identifiers are done at compile time and not at code-generation.

EDIT: I found below snippet in the CPython repo in symtable.c

"The symbol table requires two passes to determine the scope of each name. The first pass collects raw facts from the AST via the symtable_visit_* functions: the name is a parameter here, the name is used but not defined here, etc. The second pass analyzes these facts during a pass over the PySTEntryObjects created during pass 1."


Solution

  • The whole point of building an AST is to make it simple to do multiple passes over the program text. Writing multiple passes is both easier and more scalable than trying to shoehorn large complex operations with multiple dependencies into a single pass. Each pass should do one thing, and do it as simply and efficiently as possible. That reduces the problem to running the passes in the correct order.

    If you don't require declare-before-use, then you have to do name resolution in two passes (or, in some languages with more complex rules, more than two). Building up scope tables containing declarations, and then binding each variable use to the appropriate declaration in a second pass, seems to be precisely the way to go. If you do that, you may even be able to lift the rule that variables be declared before use; it's often convenient to allow more flexibility. (Many languages allow class members to be declared after function definitions which use those members, for example.) But it's your language; I'm just mentioning the possibility.

    As for code generation, I don't see why you would generate bytecode before you do name resolution. Once name resolution is done, there is no longer a problem identifying functions at point-of-call, even if the actual declaration comes later. That's not to say that code generation will always be done in a single pass; it's quite common to do code transformations of one kind or another (often optimisations, but also desugaring), and those transformations are more conveniently done in their own pass(es).