compiler-construction abstract-syntax-tree javacc semantic-analysis symbol-table

JavaCC Interpreter (AST to Symbol Table)

I'm getting quite confused about how I can create an javacc interpreter, particularly how to build a symbol table from an AST tree generated previously.

Something like this, from this AST:

> Program
>  Id
>  Id
>  Id
>  VarDecl
>   Type
>   Id
>  Stl
>   Id
>   NewInt
>    IntLit
>  Sta
>   Id
>   IntLit
>   ParseArgs
>    Id
>    IntLit
>  Sta
>   Id
>   IntLit
>   ParseArgs
>    Id
>    IntLit
(…)

To this table

=== Symbol table ===
Name    Type    Init    Values
----    ----    ----    ------
args    args[]  true    2   12  8
x        int[]  true    2   4   0

With this Input for example

class gcd {
  public static void main(String[] args) {
    int[] x;
    x = new int[2];

    x[0] = Integer.parseInt(args[0]);
    x[1] = Integer.parseInt(args[1]);
    if (x[0] == 0)
      System.out.println(x[1]);
    else
      while (x[1] > 0) {
        if (x[0] > x[1])
          x[0] = x[0] - x[1];
        else
          x[1] = x[1] - x[0];
      }
    System.out.println(x[0]);
  }
}

What i have now, only creates the AST.

My big issue is how to define and then compare the Types on tree, one by one.

Any help would be great, including theory.

Thanks.

Solution

The simple answer is, "walk the tree, build the symbol table".

As your recurse down the tree, you will encounter scoping constructs; build a scope for each such node encountered, and push on the top of a scope stack. As you visit declaration children of the node introduced that a scope, insert the definition of the declaration in the scope on top of the scope stack. As you return from a node that introduced a scope, pop your scope stack. Voila: symbol tables for lexically-scoped languages. All this is independent of JavaCC, and is explained well in compiler books; suggest you get one and read it carefully.

Languages with namespaces are not so easy, but can be shoehorned into this structure. There are languages with more complex relations between scopes where this isn't so easy.

Now, to do this for Java... the details of the type system are complex and arcane, and the complexity of knowing what a syntax for a type means is surprisingly complex, esp. with templated types. No compiler book can help you here; you need to interpret the Java reference manual if you build your own symbol table. Expect this be quite difficult; it is for everybody else.

You'll discover one of those "not so easily nested" scoping issues when you encounter package references; to resolve names for the module containing the package reference, you first have to find the source or class file for the referenced package and build symbol tables for that. This in effect means that in the middle of building a symbol table for one file, you may have to reach into the file system, parse a file [as text or as class] and build symbol tables for that before proceeding.

Bottom line: Java is full of inconvenient surprises for symbol-table builders.

(I build program analysis tools. It took us only a few days to get all the various Java dialects to parse partly because we have really good parsing machinery; it has taken us months to build an appropriate symbol table up through Java 1.7 and we are working on Java 1.8).

If you actually want to use the AST and symbol table, you're much better off getting/using somebody else's parser/name-type resolver. The JDT AST machinery comes to mind. My company offers a tool in this space, too.