I created a simple grammar in AntlWorks. Then I generated code and I have two files: grammarLexer.java
and grammarParser.java
. My goal is to create mapping my grammar to java language. What should I do next to achieve it?
Here is my grammar: ` grammar grammar; prog : ((FOR | WHILE | IF | PRINT | DECLARATION | ENTER | (WS* FUNCTION) | VARIABLE) | FUNCTION_DEC)+;
FOR : WS* 'for' WS+ VARIABLE WS+ DIGIT+ WS+ DIGIT+ WS* ENTER ( FOR | WHILE | IF | PRINT | DECLARATION | ENTER | (WS* FUNCTION) | INC_DEC )* WS* 'end' WS* ENTER;
WHILE : WS* 'while' WS+ (VARIABLE | DIGIT+) WS* EQ_OPERATOR WS* (VARIABLE | DIGIT+) WS* ENTER (FOR | WHILE | IF | PRINT | DECLARATION | ENTER | (WS* FUNCTION) | (WS* INC_DEC))* WS* 'end' WS* ENTER;
IF : WS* 'if' WS+ ( FUNCTION | VARIABLE | DIGIT+) WS* EQ_OPERATOR WS* (VARIABLE | DIGIT+) WS* ENTER (FOR | WHILE | IF | PRINT | DECLARATION | ENTER | (WS* FUNCTION) | INC_DEC)* ( WS* 'else' ENTER (FOR | WHILE | IF | PRINT | DECLARATION | ENTER | (WS* FUNCTION) | (WS* INC_DEC))*)? WS* 'end' WS* ENTER;
CHAR : ('a'..'z'|'A'..'Z')+;
EQ_OPERATOR : ('<' | '>' | '==' | '>=' | '<=' | '!=');
DIGIT : '0'..'9'+;
ENTER : '\n';
WS : ' ' | '\t';
PRINT_TEMPLATE : WS+ (('"' (CHAR | DIGIT | WS)* '"') | VARIABLE | DIGIT+ | FUNCTION | INC_DEC);
PRINT : WS* 'print' PRINT_TEMPLATE (',' PRINT_TEMPLATE)* WS* ENTER;
VARIABLE : CHAR(CHAR|DIGIT)*;
FUN_TEMPLATE : WS* (VARIABLE | DIGIT+ | '"' (CHAR | DIGIT | WS)* '"');
FUNCTION : VARIABLE '(' (FUN_TEMPLATE (WS* ',' FUN_TEMPLATE)*)? ')' WS* ENTER*;
DECLARATION : WS* VARIABLE WS* ('=' WS* (DIGIT+ | '"' (CHAR | DIGIT | WS)* '"' | VARIABLE)) WS* ENTER;
FUNCTION_DEC : WS*'def' WS* FUNCTION ( FOR | WHILE | IF | PRINT | DECLARATION | ENTER | (WS* FUNCTION) | INC_DEC )* WS* 'end' WS* ENTER*;
INC_DEC : VARIABLE ('--' | '++') WS* ENTER*;`
Here is my Main class for parser:
`
import org.antlr.runtime.ANTLRStringStream;
import org.antlr.runtime.CommonToken;
import org.antlr.runtime.CommonTokenStream;
import org.antlr.runtime.Parser;
public class Main {
public static void main(String[] args) throws Exception {
// the input source
String source =
"for i 1 3\n " +
"printHi()\n " +
"end\n " +
"if fun(y, z) == 0\n " +
"end\n ";
// create an instance of the lexer
grammarLexer lexer = new grammarLexer(new ANTLRStringStream(source));
// wrap a token-stream around the lexer
CommonTokenStream tokens = new CommonTokenStream(lexer);
// traverse the tokens and print them to see if the correct tokens are created
int n = 1;
for(Object o : tokens.getTokens()) {
CommonToken token = (CommonToken)o;
System.out.println("token(" + n + ") = " + token.getText().replace("\n", "\\n"));
n++;
}
grammarParser parser = new grammarParser(tokens);
parser.file();
}
}
`
As I already mentioned in comments: your overuse of lexer rules is wrong. Look at lexer rules as being the fundamental building blocks of your language. Much like how you'd describe water in chemistry. You would not describe water like this:
WATER
: 'HHO'
;
I.e.: as a single element. Water should be described as 3 separate elements:
water
: Hydrogen Hydrogen Oxygen
;
Hydrogen : 'H';
Oxygen : 'O';
where Hydrogen
and Oxygen
are the fundamental building blocks (lexer rules) and water
is the compound (the parser rule).
A good rule of thumb is that if you're creating lexer rules that consist of several other lexer rules, chances are there's something fishy in your grammar. This is not always the case, of course.
Let's say you want to parse the following input:
for i 1 3
print(i)
end
if fun(y, z) == 0
print('foo')
end
A grammar could look like this:
grammar T;
options {
output=AST;
}
tokens {
BLOCK;
CALL;
PARAMS;
}
// parser rules
parse
: block EOF!
;
block
: stat* -> ^(BLOCK stat*)
;
stat
: for_stat
| if_stat
| func_call
;
for_stat
: FOR^ ID expr expr block END!
;
if_stat
: IF^ expr block END!
;
expr
: eq_expr
;
eq_expr
: atom (('==' | '!=')^ atom)*
;
atom
: func_call
| INT
| ID
| STR
;
func_call
: ID '(' params ')' -> ^(CALL ID params)
;
params
: (expr (',' expr)*)? -> ^(PARAMS expr*)
;
// lexer rules
FOR : 'for';
END : 'end';
IF : 'if';
ID : ('a'..'z' | 'A'..'Z')+;
INT : '0'..'9'+;
STR : '\'' ~('\'')* '\'';
SP : (' ' | '\t' | '\r' | '\n')+ {skip();};
And if you now run this test class:
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;
public class Main {
public static void main(String[] args) throws Exception {
String src =
"for i 1 3 \n" +
" print(i) \n" +
"end \n" +
" \n" +
"if fun(y, z) == 0 \n" +
" print('foo') \n" +
"end \n";
TLexer lexer = new TLexer(new ANTLRStringStream(src));
TParser parser = new TParser(new CommonTokenStream(lexer));
CommonTree tree = (CommonTree)parser.parse().getTree();
DOTTreeGenerator gen = new DOTTreeGenerator();
StringTemplate st = gen.toDOT(tree);
System.out.println(st);
}
}
you'll see some output being printed to the console which corresponds to the following AST: