i am building a simple grammar parser, with regex. It works but now i want to add Abstract Syntax Tree. But i still dont understand how to set it up. i included the parser.
The parser gets a string and tokeniaze it with the lexer.
The tokens include the value and a type.
Any idea how to setup nodes to build a AST?
public class Parser {
lexer lex;
Hashtable<String, Integer> data = new Hashtable<String, Integer>();
public Parser( String str){
ArrayList<Token> token = new ArrayList<Token>();
String[] strpatt = { "[0-9]*\\.[0-9]+", //0
"[a-zA-Z_][a-zA-Z0-9_]*",//1
"[0-9]+",//2
"\\+",//3
"\\-",//4
"\\*",//5
"\\/",//6
"\\=",// 7
"\\)",// 8
"\\("//9
};
lex = new lexer(strpatt, "[\\ \t\r\n]+");
lex.set_data(str);
}
public int peek() {
//System.out.println(lex.peek().type);
return lex.peek().type;
}
public boolean peek( String[] regex) {
return lex.peek(regex);
}
public void set_data( String s) {
lex.set_data(s);
}
public Token scan() {
return lex.scan();
}
public int goal() {
int ret = 0;
while(peek() != -1) {
ret = expr();
}
return ret;
}
}
Currently, you are simply evaluating as you parse:
ret = ret * term()
The easiest way to think of an AST is that it is just a different kind of evaluation. Instead of producing a numeric result from numeric sub-computations, as above, you produce a description of the computation from descriptions of the sub-computations. The description is represented as small structure which contains the essential information:
ret = BuildProductNode(ret, term());
Or, perhaps
ret = BuildBinaryNode(Product, ret, term());
It's a tree because the Node objects which are being passed around refer to other Node objects without there ever being a cycle or a node with two different parents.
Clearly there are a lot of details missing from the above, particularly the precise nature of the Node object. But it's a rough outline.