Our last assignment for our compiler theory class has us creating a compiler for a small subset of Java (not MiniJava). Our prof gave us the option of using whatever tools we wish, and after a lot of poking around, I settled on ANTLR. I've managed to get the scanner and the parser up and running, and the parser outputting an AST. I'm stuck now trying to get a tree grammar file to compile. I understand the basic idea is to copy the grammar rules from the parser and eliminate most of the code, leaving the rewrite rules in place, but it doesn't seem to want to compile (offendingToken error). Am I on the right track? Am I missing something trivial?
Tree Grammar:
tree grammar J0_SemanticAnalysis;
options {
language = Java;
tokenVocab = J0_Parser;
ASTLabelType = CommonTree;
}
@header
{
package ritterre.a4;
import java.util.Map;
import java.util.HashMap;
}
@members
{
}
walk
: compilationunit
;
compilationunit
: ^(UNIT importdeclaration* classdeclaration*)
;
importdeclaration
: ^(IMP_DEC IDENTIFIER+)
;
classdeclaration
: ^(CLASS IDENTIFIER ^(EXTENDS IDENTIFIER)? fielddeclaration* methoddeclaration*)
;
fielddeclaration
: ^(FIELD_DEC IDENTIFIER type visibility? STATIC?)
;
methoddeclaration
: ^(METHOD_DEC IDENTIFIER type visibility? STATIC? ^(PARAMS parameter+)? body)
;
visibility
: PRIVATE
| PUBLIC
;
parameter
: ^(PARAM IDENTIFIER type)
;
body
: ^(BODY ^(DECLARATIONS localdeclaration*) ^(STATEMENTS statement*))
;
localdeclaration
: ^(DECLARATION type IDENTIFIER)
;
statement
: assignment
| ifstatement
| whilestatement
| returnstatement
| callstatement
| printstatement
| block
;
assignment
: ^(ASSIGN IDENTIFIER+ expression? expression)
;
ifstatement
: ^(IF relation statement ^(ELSE statement)?)
;
whilestatement
: ^(WHILE relation statement)
;
returnstatement
: ^(RETURN expression?)
;
callstatement
: ^(CALL IDENTIFIER+ expression+)
;
printstatement
: ^(PRINT expression)
;
block
: ^(STATEMENTS statement*)
;
relation
// : expression (LTHAN | GTHAN | EQEQ | NEQ)^ expression
: ^(LTHAN expression expression)
| ^(GTHAN expression expression)
| ^(EQEQ expression expression)
| ^(NEQ expression expression)
;
expression
// : (PLUS | MINUS)? term ((PLUS | MINUS)^ term)*
: ^(PLUS term term)
| ^(MINUS term term)
;
term
// : factor ((MULT | DIV)^ factor)*
: ^(MULT factor factor)
| ^(DIV factor factor)
;
factor
: NUMBER
| IDENTIFIER (DOT IDENTIFIER | LBRAC expression RBRAC)?
| NULL
| NEW IDENTIFIER LPAREN RPAREN
| NEW (INT | IDENTIFIER) (LBRAC RBRAC)?
;
type
: (INT | IDENTIFIER) (LBRAC RBRAC)?
| VOID
;
Parser Grammar:
parser grammar J0_Parser;
options
{
output = AST; // Output an AST
tokenVocab = J0_Scanner; // Pull Tokens from Scanner
//greedy = true; // forcing this throughout?! success!!
//cannot force greedy true throughout. bad things happen and the parser doesnt build
}
tokens
{
UNIT;
IMP_DEC;
FIELD_DEC;
METHOD_DEC;
PARAMS;
PARAM;
BODY;
DECLARATIONS;
STATEMENTS;
DECLARATION;
ASSIGN;
CALL;
}
@header { package ritterre.a4; }
// J0 - Extended Specification - EBNF
parse
: compilationunit EOF -> compilationunit
;
compilationunit
: importdeclaration* classdeclaration*
-> ^(UNIT importdeclaration* classdeclaration*)
;
importdeclaration
: IMPORT IDENTIFIER (DOT IDENTIFIER)* SCOLON
-> ^(IMP_DEC IDENTIFIER+)
;
classdeclaration
: (PUBLIC)? CLASS n=IDENTIFIER (EXTENDS e=IDENTIFIER)? LBRAK (fielddeclaration|methoddeclaration)* RBRAK
-> ^(CLASS $n ^(EXTENDS $e)? fielddeclaration* methoddeclaration*)
;
fielddeclaration
: visibility? STATIC? type IDENTIFIER SCOLON
-> ^(FIELD_DEC IDENTIFIER type visibility? STATIC?)
;
methoddeclaration
: visibility? STATIC? type IDENTIFIER LPAREN (parameter (COMMA parameter)*)? RPAREN body
-> ^(METHOD_DEC IDENTIFIER type visibility? STATIC? ^(PARAMS parameter+)? body)
;
visibility
: PRIVATE
| PUBLIC
;
parameter
: type IDENTIFIER
-> ^(PARAM IDENTIFIER type)
;
body
: LBRAK localdeclaration* statement* RBRAK
-> ^(BODY ^(DECLARATIONS localdeclaration*) ^(STATEMENTS statement*))
;
localdeclaration
: type IDENTIFIER SCOLON
-> ^(DECLARATION type IDENTIFIER)
;
statement
: assignment
| ifstatement
| whilestatement
| returnstatement
| callstatement
| printstatement
| block
;
assignment
: IDENTIFIER (DOT IDENTIFIER | LBRAC a=expression RBRAC)? EQ b=expression SCOLON
-> ^(ASSIGN IDENTIFIER+ $a? $b)
;
ifstatement
: IF LPAREN relation RPAREN statement (options {greedy=true;} : ELSE statement)?
-> ^(IF relation statement ^(ELSE statement)?)
;
whilestatement
: WHILE LPAREN relation RPAREN statement
-> ^(WHILE relation statement)
;
returnstatement
: RETURN expression? SCOLON
-> ^(RETURN expression?)
;
callstatement
: IDENTIFIER (DOT IDENTIFIER)? LPAREN (expression (COMMA expression)*)? RPAREN SCOLON
-> ^(CALL IDENTIFIER+ expression+)
;
printstatement
: PRINT LPAREN expression RPAREN SCOLON
-> ^(PRINT expression)
;
block
: LBRAK statement* RBRAK
-> ^(STATEMENTS statement*)
;
relation
: expression (LTHAN | GTHAN | EQEQ | NEQ)^ expression
;
expression
: (PLUS | MINUS)? term ((PLUS | MINUS)^ term)*
;
term
: factor ((MULT | DIV)^ factor)*
;
factor
: NUMBER
| IDENTIFIER (DOT IDENTIFIER | LBRAC expression RBRAC)?
| NULL
| NEW IDENTIFIER LPAREN RPAREN
| NEW (INT | IDENTIFIER) (LBRAC RBRAC)?
;
type
: (INT | IDENTIFIER) (LBRAC RBRAC)?
| VOID
;
The problem is that in your tree grammar, you do the following (3 times I believe):
classdeclaration
: ^(CLASS ... ^(EXTENDS IDENTIFIER)? ... )
;
the ^(EXTENDS IDENTIFIER)?
part is wrong: you need to wrap the tree around parenthesis, and only then make it optional:
classdeclaration
: ^(CLASS ... (^(EXTENDS IDENTIFIER))? ... )
;
However, it would be a bit too easy if that was all, wouldn't it? :)
When you fix the problem mentioned above, ANTLR will complain that the tree grammar is ambiguous when trying to generate a tree-walker from your tree grammar. ANTLR will throw the following towards you:
error(211): J0_SemanticAnalysis.g:61:26: [fatal] rule assignment has non-LL(*) decision due to recursive rule invocations reachable from alts 1,2. Resolve by left-factoring or using syntactic predicates or using backtrack=true option.
It complains about the assignment
rule in your grammar:
assignment
: ^(ASSIGN IDENTIFIER+ expression? expression)
;
since ANTLR is an LL parser generator1, it parses tokens from left to right. Therefor the optional expression in IDENTIFIER+ expression? expression
makes the grammar ambiguous. Fix this by moving the ?
to the last expression
:
assignment
: ^(ASSIGN IDENTIFIER+ expression expression?)
;
1 don't let the last two letters in the name ANTLR mislead you, they stand for Language Recognition, not the class of parsers it generates!