Search code examples
javaparsingantlr4lexerparse-tree

ANTLR4 doesnt parse the .g4 file the way I expected


We are a couple of guys doing a University project, which led us to play around with ANTLR4. We are just trying to figure it out as we go, but have stumbled upon a issue we can’t seem to fix.

We were currently trying to figure out the "moveList" in the grammar, and we have been running a simple test.txt la = Path true ATTACK UP 5;.

Currently we are working with this .g4 file:

grammar hess;

program: line* EOF;

defineBoard: 'BOARD' '(' BOARDPOSITION ')';

line: statement | ifBlock | whileBlock | defineBoard;

statement: (assignment | functionCall) ';';

ifBlock: 'if' expression block ('else' elseIfBlock);

elseIfBlock: block | ifBlock;

whileBlock: 'while' expression block ('else' elseIfBlock);

assignment: IDENTIFIER '=' moveList | IDENTIFIER '=' expression;

functionCall: IDENTIFIER '(' (expression (',' expression))? ')';

expression:
    constant                            # constantExpression
    | IDENTIFIER                        # identifierExpression
    | '(' expression ')'                # parenthesizedExpression
    | '!' expression                    # notExpression
    | expression multOp expression      # multiplicativeExpression
    | expression addOp expression       # additiveExpression
    | expression compareOp expression   # comparisonExpression
    | expression boolOp expression      # booleanExpression;

moveList: move | move moveTail;
moveTail: ',' move moveTail;
moveExtra:
    INTEGER
    | INTEGER direction INTEGER
    | direction INTEGER;
move: Movetype COLLISION Attacktype direction moveExtra;

IDENTIFIER: [a-zA-Z][a-zA-Z0-9];
constant:
    BOARDPOSITION
    | INTEGER
    | FLOAT
    | LETTER
    | STRING
    | BOOL
    | NULL;
BOARDPOSITION: LETTER INTEGER;
INTEGER: [1-9][0-9]* | [0];
FLOAT: [0-9]+ '.' [0-9]+;
STRING: ('"' ~'"'* '"') | ('\'' ~'\''* '\'');
LETTER: [a-zA-Z];

BOOL: 'true' | 'false';
COLLISION: BOOL;
NULL: 'null';

block: '{' line* '}';

WS: [ \t\r\n]+ -> skip;

multOp: '*' | '/' | '%';
addOp: '+' | '-';
boolOp: 'and' | 'or' | 'xor';
compareOp: '==' | '!=' | '>' | '<' | '>=' | '<=';
direction: 'UP' | 'LEFT' | 'RIGHT' | 'DOWN';
Movetype: 'Direct' | 'Path';
Attacktype: 'ATTACK' | 'MOVE' | 'ATKMOVE';

Our thought process, regarding the grammar, would be like so:

line -> statement -> assigment -> moveList

and then

"la" = IDENTIFIER
"Path" = Movetype
"ATTACK" = Attacktype
"UP" = Direction
"5" = INTEGER

We get the following Parse Tree when we debug/run the .g4 file: Parse Tree

As you can see in the Parse Tree it says unexpected values, for what in my eyes look like an expected value.


Solution

  • The problem is that your COLLISION rule is a lex rule, but the true token in your input is already being tokenised as a BOOL token. If you rename the COLLISION rule in your grammar to collision, like this:

    move: Movetype collision Attacktype direction moveExtra;
    collision: BOOL;
    

    Then it becomes a parse rule, and the parsing works for your test input:

    (program (line (statement (assignment la = (moveList (move Path (collision true) ATTACK (direction UP) (moveExtra 5)))) ;)) <EOF>)