Search code examples
parsingscriptingprogramming-languagesantlrgrammar

Why isn't this expression being parsed correctly?


NOTE: This is a continuation of the topic posted HERE.

I'm working on a parser for the Jass scripting language (here's an excellent API reference for it) so that I may use it as in interpreter for another language. Using ANTLR4 + ANTLRWorks 2, I have run this complex script to test the lexer/parser's strength, and have passed nearly all tests. The part where it fails is on in an 'elseif' statement, containing an expression with:

  • an outer parenthesis...
  • an array element...
  • a boolean/binary operation, AND...
  • a unary constant integer

...like so:

elseif(si__DroneSystem___data_V[this]!=-1)then (line #53 of the script).


Any changes I've made to the grammar fails to get ANTLR to recognize this input as a proper expression. The following grammar is what I've managed to write, thus far:

grammar Jass;

//----------------------------------------------------------------------
// Global Declarations
//----------------------------------------------------------------------
program                 : file+
                        ;
file                    : declaration* function
                        ;
declaration             : globals | typedef | native_func
                        ;
typedef                 : KEYWORD_TYPE identifier KEYWORD_EXTENDS (TYPE_HANDLE | identifier) 
                        ;
globals                 : KEYWORD_GLOBALS global_var_list KEYWORD_ENDGLOBALS
                        ;
global_var_list         : var_declaration*
                        ;
native_func             : KEYWORD_CONSTANT? KEYWORD_NATIVE func_declaration
                        ;
func_declaration        : identifier KEYWORD_TAKES (KEYWORD_NOTHING | parameter_list) KEYWORD_RETURNS (KEYWORD_NOTHING | type)
                        ;
parameter_list          : type identifier (',' type identifier)*
                        ;
function                : KEYWORD_CONSTANT? KEYWORD_FUNCTION func_declaration local_var_list statement_list KEYWORD_ENDFUNCTION
                        ;

//----------------------------------------------------------------------
// Local Declarations
//----------------------------------------------------------------------
local_var_list          : (KEYWORD_LOCAL? var_declaration)*
                        ;
var_declaration         : KEYWORD_CONSTANT type identifier '=' expression
                        | type identifier ('=' expression)? | type TYPE_ARRAY identifier
                        ;

//----------------------------------------------------------------------
// Statements
//----------------------------------------------------------------------
statement_list          : statement*
                        ;
statement               : set | call | if_statement | loop | exitwhen | return_statement | debug
                        ;
set                     : KEYWORD_SET identifier '=' expression | KEYWORD_SET identifier OPENBRACKET expression CLOSEBRACKET '=' expression
                        ;
call                    : KEYWORD_CALL identifier OPENPARENTHESIS args? CLOSEPARENTHESIS
                        ;
args                    : expression (COMMA expression)*
                        ;
if_statement            : KEYWORD_IF expression KEYWORD_THEN statement_list else_clause? KEYWORD_ENDIF
                        ;
else_clause             : KEYWORD_ELSEIF ((OPENPARENTHESIS expression CLOSEPARENTHESIS) | expression) KEYWORD_THEN statement_list
                        | KEYWORD_ELSE ((OPENPARENTHESIS statement_list CLOSEPARENTHESIS) | statement_list) else_clause?
                        ;
loop                    : KEYWORD_LOOP statement_list KEYWORD_ENDLOOP
                        ;
                        // must appear in a loop
exitwhen                : KEYWORD_EXITWHEN expression
                        ;
return_statement        : KEYWORD_RETURN expression?
                        ;
debug                   : KEYWORD_DEBUG (set | call | if_statement | loop)
                        ;

//----------------------------------------------------------------------
// Expressions
//----------------------------------------------------------------------
expression              : parenthesis
                        | func_call
                        | array_ref
                        | (boolean_expression | binary_operation)
                        | unary_operation
                        | function_reference
                        | const_statement
                        | identifier
                        ;
binary_operation        : terminal (('+'|'-'|'*'|'/'|'>'|'<'|'=='|'!='|'>='|'<=') terminal)
                        ;
unary_operation         : ('+'|'-'|'not') terminal
                        ;
boolean_expression      : ('and'|'not')? terminal (('=='|'!=') terminal) ('and'|'or')?
                        ;
terminal                : factor*/(factor)
                        ;
factor                  : identifier
                        | const_statement
                        | parenthesis
                        | brackets
                        ;
parenthesis             : OPENPARENTHESIS expression CLOSEPARENTHESIS
                        ;
brackets                : OPENBRACKET expression CLOSEBRACKET
                        ;
                        // expression must be integer or real when used with unary '+'
func_call               : identifier OPENPARENTHESIS args? CLOSEPARENTHESIS
                        ;
array_ref               : identifier OPENBRACKET expression CLOSEBRACKET
                        ;
function_reference      : KEYWORD_FUNCTION identifier
                        ;
const_statement         : INTEGER_CONST | REAL_CONST | BOOL_CONST | STRING_CONST | ASSIGNMENT_TYPE_NULL
                        ;
FOURCC                  : QUOTATION_SINGLE . . . . QUOTATION_SINGLE
                        ;
INTEGER_CONST           : DECIMAL | OCTAL | HEXIDECIMAL | FOURCC
                        ;
DECIMAL                 : (DIGIT)+ | (DIGIT+) '.' (DIGIT+)?
                        ;
OCTAL                   : '0'..'7'+
                        ;
HEXIDECIMAL             : '$'(DIGIT|'a'..'f'|'A'..'F')+ | '0'('x'|'X')(DIGIT|'a'..'f'|'A'..'F')+
                        ;
REAL_CONST              : (DIGIT)+'.'(DIGIT)* | '.'(DIGIT)+
                        ;
BOOL_CONST              : ASSIGNMENT_TYPE_TRUE | ASSIGNMENT_TYPE_FALSE
                        ;
                        // any double-quotes in the string must be escaped with \
STRING_CONST            : QUOTATION_DOUBLE .*? QUOTATION_DOUBLE
                        ;

//----------------------------------------------------------------------
// Base
//----------------------------------------------------------------------
type                    : nativetype | commontype
                        ;
identifier              : ID
                        ;

//////////////////////////////////////////////////////////////////////////////////////////////
// TYPES
//////////////////////////////////////////////////////////////////////////////////////////////
TYPE_BOOLEAN            : 'boolean'
                        ;
TYPE_CODE               : 'code'
                        ;
TYPE_HANDLE             : 'handle'
                        ;
TYPE_INTEGER            : 'integer'
                        ;
TYPE_REAL               : 'real'
                        ;
TYPE_STRING             : 'string'
                        ;
TYPE_ARRAY              : 'array'
                        ;
nativetype              : TYPE_BOOLEAN
                        | TYPE_CODE
                        | TYPE_HANDLE
                        | TYPE_INTEGER
                        | TYPE_REAL
                        | TYPE_STRING
                        | TYPE_ARRAY
                        ;
TYPE_ABILITY            : 'ability'
                        ;
TYPE_AGENT              : 'agent'
                        ;
TYPE_AIDIFFICULTY       : 'aidifficulty'
                        ;
TYPE_ALLIANCETYPE       : 'alliancetype'
                        ;
TYPE_ATTACKTYPE         : 'attacktype'
                        ;
TYPE_BLENDMODE          : 'blendmode'
                        ;
TYPE_BOOLEXPR           : 'boolexpr'
                        ;
TYPE_BUFF               : 'buff'
                        ;
TYPE_BUTTON             : 'button'
                        ;
TYPE_CAMERAFIELD        : 'camerafield'
                        ;
TYPE_CAMERASETUP        : 'camerasetup'
                        ;
TYPE_CONDITIONFUNC      : 'conditionfunc'
                        ;
TYPE_DAMAGETYPE         : 'damagetype'
                        ;
TYPE_DEFEATCONDITION    : 'defeatcondition'
                        ;
TYPE_DESTRUCTABLE       : 'destructable'
                        ;
TYPE_DIALOG             : 'dialog'
                        ;
TYPE_DIALOGEVENT        : 'dialogevent'
                        ;
TYPE_EFFECT             : 'effect'
                        ;
TYPE_EVENTID            : 'eventid'
                        ;
TYPE_FGAMESTATE         : 'fgamestate'
                        ;
TYPE_FILTERFUNC         : 'filterfunc'
                        ;
TYPE_FOGMODIFIER        : 'fogmodifier'
                        ;
TYPE_FOGSTATE           : 'fogstate'
                        ;
TYPE_FORCE              : 'force'
                        ;
TYPE_GAMECACHE          : 'gamecache'
                        ;
TYPE_GAMEDIFFICULTY     : 'gamedifficulty'
                        ;
TYPE_GAMEEVENT          : 'gameevent'
                        ;
TYPE_GAMESPEED          : 'gamespeed'
                        ;
TYPE_GAMESTATE          : 'gamestate'
                        ;
TYPE_GAMETYPE           : 'gametype'
                        ;
TYPE_GROUP              : 'group'
                        ;
TYPE_HASHTABLE          : 'hashtable'
                        ;
TYPE_IGAMESTATE         : 'igamestate'
                        ;
TYPE_IMAGE              : 'image'
                        ;
TYPE_ITEM               : 'item'
                        ;
TYPE_ITEMPOOL           : 'itempool'
                        ;
TYPE_ITEMTYPE           : 'itemtype'
                        ;
TYPE_LEADERBOARD        : 'leaderboard'
                        ;
TYPE_LIGHTNING          : 'lightning'
                        ;
TYPE_LIMITOP            : 'limitop'
                        ;
TYPE_LOCATION           : 'location'
                        ;
TYPE_MAPCONTROL         : 'mapcontrol'
                        ;
TYPE_MAPDENSITY         : 'mapdensity'
                        ;
TYPE_MAPFLAG            : 'mapflag'
                        ;
TYPE_MAPSETTING         : 'mapsettings'
                        ;
TYPE_MAPVISIBILITY      : 'mapvisibility'
                        ;
TYPE_MULTIBOARD         : 'multiboard'
                        ;
TYPE_MULTIBOARDITEM     : 'multiboarditem'
                        ;
TYPE_PATHINGTYPE        : 'pathingtype'
                        ;
TYPE_PLACEMENT          : 'placement'
                        ;
TYPE_PLAYER             : 'player'
                        ;
TYPE_PLAYERCOLOR        : 'playercolor'
                        ;
TYPE_PLAYEREVENT        : 'playerevent'
                        ;
TYPE_PLAYERGAMERESULT   : 'playergameresult'
                        ;
TYPE_PLAYERSCORE        : 'playerscore'
                        ;
TYPE_PLAYERSLOTSTATE    : 'playerslotstate'
                        ;
TYPE_PLAYERSTATE        : 'playerstate'
                        ;
TYPE_PLAYERUNITEVENT    : 'playerunitevent'
                        ;
TYPE_QUEST              : 'quest'
                        ;
TYPE_QUESTITEM          : 'questitem'
                        ;
TYPE_RACE               : 'race'
                        ;
TYPE_RACEPREFERENCE     : 'racepreference'
                        ;
TYPE_RARITYCONTROL      : 'raritycontrol'
                        ;
TYPE_RECT               : 'rect'
                        ;
TYPE_REGION             : 'region'
                        ;
TYPE_SOUND              : 'sound'
                        ;
TYPE_SOUNDTYPE          : 'soundtype'
                        ;
TYPE_STARTLOCPRIO       : 'startlocprio'
                        ;
TYPE_TERRAINDEFORMATION : 'terraindeformation'
                        ;
TYPE_TEXMAPFLAGS        : 'texmapflags'
                        ;
TYPE_TEXTTAG            : 'texttag'
                        ;
TYPE_TIMER              : 'timer'
                        ;
TYPE_TIMERDIALOG        : 'timerdialog'
                        ;
TYPE_TRACKABLE          : 'trackable'
                        ;
TYPE_TRIGGER            : 'trigger'
                        ;
TYPE_TRIGGERACTION      : 'triggeraction'
                        ;
TYPE_TRIGGERCONDITION   : 'triggercondition'
                        ;
TYPE_UBERSPLAT          : 'ubersplat'
                        ;
TYPE_UNIT               : 'unit'
                        ;
TYPE_UNITEVENT          : 'unitevent'
                        ;
TYPE_UNITPOOL           : 'unitpool'
                        ;
TYPE_UNITSTATE          : 'unitstate'
                        ;
TYPE_UNITTYPE           : 'unittype'
                        ;
TYPE_VERSION            : 'version'
                        ;
TYPE_VOLUMEGROUP        : 'volumegroup'
                        ;
TYPE_WEAPONTYPE         : 'weapontype'
                        ;
TYPE_WEATHEREFFECT      : 'weathereffect'
                        ;
TYPE_WIDGET             : 'widget'
                        ;
TYPE_WIDGETEVENT        : 'widgetevent'
                        ;
commontype              : TYPE_ABILITY
                        | TYPE_AGENT
                        | TYPE_AIDIFFICULTY
                        | TYPE_ALLIANCETYPE
                        | TYPE_ATTACKTYPE
                        | TYPE_BLENDMODE
                        | TYPE_BOOLEXPR
                        | TYPE_BUFF
                        | TYPE_BUTTON
                        | TYPE_CAMERAFIELD
                        | TYPE_CAMERASETUP
                        | TYPE_CONDITIONFUNC
                        | TYPE_DAMAGETYPE
                        | TYPE_DEFEATCONDITION
                        | TYPE_DESTRUCTABLE
                        | TYPE_DIALOG
                        | TYPE_DIALOGEVENT
                        | TYPE_EFFECT
                        | TYPE_EVENTID
                        | TYPE_FGAMESTATE
                        | TYPE_FILTERFUNC
                        | TYPE_FOGMODIFIER
                        | TYPE_FOGSTATE
                        | TYPE_FORCE
                        | TYPE_GAMECACHE
                        | TYPE_GAMEDIFFICULTY
                        | TYPE_GAMEEVENT
                        | TYPE_GAMESPEED
                        | TYPE_GAMESTATE
                        | TYPE_GAMETYPE
                        | TYPE_GROUP
                        | TYPE_HASHTABLE
                        | TYPE_IGAMESTATE
                        | TYPE_IMAGE
                        | TYPE_ITEM
                        | TYPE_ITEMPOOL
                        | TYPE_ITEMTYPE
                        | TYPE_LEADERBOARD
                        | TYPE_LIGHTNING
                        | TYPE_LIMITOP
                        | TYPE_LOCATION
                        | TYPE_MAPCONTROL
                        | TYPE_MAPDENSITY
                        | TYPE_MAPFLAG
                        | TYPE_MAPSETTING
                        | TYPE_MAPVISIBILITY
                        | TYPE_MULTIBOARD
                        | TYPE_MULTIBOARDITEM
                        | TYPE_PATHINGTYPE
                        | TYPE_PLACEMENT
                        | TYPE_PLAYER
                        | TYPE_PLAYERCOLOR
                        | TYPE_PLAYEREVENT
                        | TYPE_PLAYERGAMERESULT
                        | TYPE_PLAYERSCORE
                        | TYPE_PLAYERSLOTSTATE
                        | TYPE_PLAYERSTATE
                        | TYPE_PLAYERUNITEVENT
                        | TYPE_QUEST
                        | TYPE_QUESTITEM
                        | TYPE_RACE
                        | TYPE_RACEPREFERENCE
                        | TYPE_RARITYCONTROL
                        | TYPE_RECT
                        | TYPE_REGION
                        | TYPE_SOUND
                        | TYPE_SOUNDTYPE
                        | TYPE_STARTLOCPRIO
                        | TYPE_TERRAINDEFORMATION
                        | TYPE_TEXMAPFLAGS
                        | TYPE_TEXTTAG
                        | TYPE_TIMER
                        | TYPE_TIMERDIALOG
                        | TYPE_TRACKABLE
                        | TYPE_TRIGGER
                        | TYPE_TRIGGERACTION
                        | TYPE_TRIGGERCONDITION
                        | TYPE_UBERSPLAT
                        | TYPE_UNIT
                        | TYPE_UNITEVENT
                        | TYPE_UNITPOOL
                        | TYPE_UNITSTATE
                        | TYPE_UNITTYPE
                        | TYPE_VERSION
                        | TYPE_VOLUMEGROUP
                        | TYPE_WEAPONTYPE
                        | TYPE_WEATHEREFFECT
                        | TYPE_WIDGET
                        | TYPE_WIDGETEVENT
                        ;
//////////////////////////////////////////////////////////////////////////////////////////////

ASSIGNMENT_TYPE_NULL    : 'null'
                        ;
ASSIGNMENT_TYPE_INTEGER : DIGIT
                        ;
ASSIGNMENT_TYPE_REAL    : REAL_CONST
                        ;
ASSIGNMENT_TYPE_TRUE    : 'true'
                        ;
ASSIGNMENT_TYPE_FALSE   : 'false'
                        ;
KEYWORD_DEBUG           : 'debug'
                        ;
KEYWORD_EXTENDS         : 'extends'
                        ;
KEYWORD_NATIVE          : 'native'
                        ;
KEYWORD_FUNCTION        : 'function'
                        ;
KEYWORD_ENDFUNCTION     : 'endfunction'
                        ;
KEYWORD_TAKES           : 'takes'
                        ;
KEYWORD_NOTHING         : 'nothing'
                        ;
KEYWORD_RETURNS         : 'returns'
                        ;
KEYWORD_CALL            : 'call'
                        ;
KEYWORD_RETURN          : 'return'
                        ;
KEYWORD_GLOBALS         : 'globals'
                        ;
KEYWORD_ENDGLOBALS      : 'endglobals'
                        ;
KEYWORD_LOCAL           : 'local'
                        ;
KEYWORD_CONSTANT        : 'constant'
                        ;
KEYWORD_SET             : 'set'
                        ;
KEYWORD_IF              : 'if'
                        ;
KEYWORD_THEN            : 'then'
                        ;
KEYWORD_ELSEIF          : 'elseif'
                        ;
KEYWORD_ELSE            : 'else'
                        ;
KEYWORD_ENDIF           : 'endif'
                        ;
KEYWORD_LOOP            : 'loop'
                        ;
KEYWORD_EXITWHEN        : 'exitwhen'
                        ;
KEYWORD_ENDLOOP         : 'endloop'
                        ;
KEYWORD_TYPE            : 'type'
                        ;
ID                      : (LETTER)((LETTER|DIGIT|'_'+)*)?
                        ;
fragment
LETTER                  : '\u0024' // $
                        | '\u0041'..'\u005a' // A-Z
                        | '\u005f' // _
                        | '\u0061'..'\u007a' // a-z
                        | '\u00c0'..'\u00d6' // Latin Capital Letter A with grave - Latin Capital letter O with diaeresis
                        | '\u00d8'..'\u00f6' // Latin Capital letter O with stroke - Latin Small Letter O with diaeresis
                        | '\u00f8'..'\u00ff' // Latin Small Letter O with stroke - Latin Small Letter Y with diaeresis
                        | '\u0100'..'\u1fff' // Latin Capital Letter A with macron - Latin Small Letter O with stroke and acute
                        | '\u3040'..'\u318f' // Hiragana
                        | '\u3300'..'\u337f' // CJK compatibility
                        | '\u3400'..'\u3d2d' // CJK compatibility
                        | '\u4e00'..'\u9fff' // CJK compatibility
                        | '\uf900'..'\ufaff' // CJK compatibility
                        ;

fragment
DIGIT                   : '0'..'9'/*'\u0030'..'\u0039' // 0-9
                        | '\u0660'..'\u0669' // Arabic-Indic Digit 0-9
                        | '\u06f0'..'\u06f9' // Extended Arabic-Indic Digit 0-9
                        | '\u0966'..'\u096f' // Devanagari 0-9
                        | '\u09e6'..'\u09ef' // Bengali 0-9
                        | '\u0a66'..'\u0a6f' // Gurmukhi 0-9
                        | '\u0ae6'..'\u0aef' // Gujarati 0-9
                        | '\u0b66'..'\u0b6f' // Oriya 0-9
                        | '\u0be7'..'\u0bef' // Tami 0-9
                        | '\u0c66'..'\u0c6f' // Telugu 0-9
                        | '\u0ce6'..'\u0cef' // Kannada 0-9
                        | '\u0d66'..'\u0d6f' // Malayala 0-9
                        | '\u0e50'..'\u0e59' // Thai 0-9
                        | '\u0ed0'..'\u0ed9' // Lao 0-9
                        | '\u1040'..'\u1049' // Myanmar 0-9?*/
                        ;
OPENPARENTHESIS         : '('
                        ;
CLOSEPARENTHESIS        : ')'
                        ;
OPENBRACKET             : '['
                        ;
CLOSEBRACKET            : ']'
                        ;
QUOTATION_DOUBLE        : '"'
                        ;
QUOTATION_SINGLE        : '\''
                        ;
COMMA                   : ','
                        ;
WS                      : (' ' | '\t' | '\n'+)+ {skip();}
                        ;
LINE_COMMENT            : '//' ~[\r\n]* -> channel(HIDDEN)
                        ;


And from ANTLRWorks...
THIS file is the output log from using TestRig (starting with the first error), and here is an image of the generated parse tree where the first error occurs:

CLICK HERE to enlarge


TO ANYONE who can help me fix this issue: I will gladly upvote your answers, as well as your next 3 questions if you are marked as the answer to this question. Thanks!


Solution

  • When looking at the BNF rules of an if statement:

    ifthenelse
             ::= 'if' expr 'then' newline statement_list else_clause? 'endif'
    else_clause
             ::= 'else' newline statement_list
               | 'elseif' expr 'then' newline statement_list else_clause?
    

    your translation:

    if_statement            : KEYWORD_IF expression KEYWORD_THEN statement_list else_clause? KEYWORD_ENDIF
                            ;
    else_clause             : KEYWORD_ELSEIF ((OPENPARENTHESIS expression CLOSEPARENTHESIS) | expression) KEYWORD_THEN statement_list
                            | KEYWORD_ELSE ((OPENPARENTHESIS statement_list CLOSEPARENTHESIS) | statement_list) else_clause?
                            ;
    

    is incorrect (you have an optional else_clause in the KEYWORD_ELSE alternative).

    It should be:

    if_statement            : KEYWORD_IF expression KEYWORD_THEN statement_list else_clause? KEYWORD_ENDIF
                            ;
    else_clause             : KEYWORD_ELSE statement_list
                            | KEYWORD_ELSEIF expression KEYWORD_THEN statement_list else_clause?
                            ;
    

    And not that you don't need ((OPENPARENTHESIS expression CLOSEPARENTHESIS) | expression) since a expression already matches '(' expression ')'.

    But the observations above are not the cause of your problem(s). The real issue is that your grammar does not account for unary expressions. It does not match the -1 in the expression si__DroneSystem___data_V[this]!=-1.

    Change your expression rule into this:

    expression              : OPENPARENTHESIS expression CLOSEPARENTHESIS
                            | OPENBRACKET expression CLOSEBRACKET
                            | func_call
                            | array_ref
                            | function_reference
                            | const_statement
                            | identifier
                            | '+' expression
                            | '-' expression
                            | 'not' expression
                            | expression ('*'|'/') expression
                            | expression ('+'|'-') expression
                            | expression ('>'|'<'|'=='|'!='|'>='|'<=') expression
                            | expression ('and'|'or') expression
                            | identifier
                            ;
    

    Now input like this:

    if this==null then
      return
    elseif(si__DroneSystem___data_V[this]!=-1)then
      return
    endif
    

    will be parsed as follows:

    enter image description here