Search code examples
javaparsingnodesabstract-syntax-tree

Java - Abstract Syntax Tree with grammar


i am building a simple grammar parser, with regex. It works but now i want to add Abstract Syntax Tree. But i still dont understand how to set it up. i included the parser. The parser gets a string and tokeniaze it with the lexer.
The tokens include the value and a type. Any idea how to setup nodes to build a AST?

public class Parser {
    lexer lex;
    Hashtable<String, Integer> data = new Hashtable<String, Integer>();


    public Parser( String str){
       ArrayList<Token> token = new ArrayList<Token>();

       String[] strpatt = { "[0-9]*\\.[0-9]+", //0
                            "[a-zA-Z_][a-zA-Z0-9_]*",//1
                            "[0-9]+",//2
                            "\\+",//3
                            "\\-",//4
                            "\\*",//5
                            "\\/",//6
                            "\\=",// 7
                            "\\)",// 8
                            "\\("//9
                          };

        lex = new lexer(strpatt, "[\\ \t\r\n]+");
        lex.set_data(str);
    }
    public int peek() {
      //System.out.println(lex.peek().type);
      return lex.peek().type;
    }
    public boolean peek( String[] regex) {
      return lex.peek(regex);
    }
    public void set_data( String s) {
      lex.set_data(s);
    }
    public Token scan() {
      return lex.scan();
    }
    public int goal() {
      int ret = 0;
      while(peek() != -1) {
        ret = expr();
      }
      return ret;
    }


}



Solution

  • Currently, you are simply evaluating as you parse:

    ret = ret * term()
    

    The easiest way to think of an AST is that it is just a different kind of evaluation. Instead of producing a numeric result from numeric sub-computations, as above, you produce a description of the computation from descriptions of the sub-computations. The description is represented as small structure which contains the essential information:

    ret = BuildProductNode(ret, term());
    

    Or, perhaps

        ret = BuildBinaryNode(Product, ret, term());
    

    It's a tree because the Node objects which are being passed around refer to other Node objects without there ever being a cycle or a node with two different parents.

    Clearly there are a lot of details missing from the above, particularly the precise nature of the Node object. But it's a rough outline.