Search code examples
javacompiler-constructionantlrantlr4

Antlr - Why it expect FunctionCall but PrintCommand gave


my Antlr-grammar expect a FunctionCall but in my example-code for the compiler built by antlr, i wrote a print-command. Does someone know why and how to fix that? The print-command is named: RetroBox.show(); The print-command should be recognised from blockstatements to blockstatement to statement to localFunctionCall to printCommand

Here my Antrl-grammar:

grammar Mars;
// ******************************LEXER 
BEGIN*****************************************

// Keywords
FUNC:                           'func';
ENTRY:                          'entry';
VARI:                           'vari';
VARF:                           'varf';
VARC:                           'varc';
VARS:                           'vars';
LET:                            'let';
INCREMENTS:                     'increments';
RETROBOX:                       'retrobox';
SHOW:                           'show';

// Literals

DECIMAL_LITERAL:    ('0' | [1-9] (Digits? | '_'+ Digits)) [lL]?;

FLOAT_LITERAL:      (Digits '.' Digits? | '.' Digits) ExponentPart? [fFdD]?
         |       Digits (ExponentPart [fFdD]? | [fFdD])
         ;

CHAR_LITERAL:       '\'' (~['\\\r\n] | EscapeSequence) '\'';

STRING_LITERAL:     '"' (~["\\\r\n] | EscapeSequence)* '"';

// Seperators

ORBRACKET:                          '(';
CRBRACKET:                          ')';
OEBRACKET:                          '{';
CEBRACKET:                          '}';
SEMI:                               ';';
POINT:                              '.';

// Operators

ASSIGN:             '=';

// Whitespace and comments

WS:                 [ \t\r\n\u000C]+ -> channel(HIDDEN);
COMMENT:            '/*' .*? '*/'    -> channel(HIDDEN);
LINE_COMMENT:       '//' ~[\r\n]*    -> channel(HIDDEN);

// Identifiers

IDENTIFIER:         Letter LetterOrDigit*;

// Fragment rules

fragment ExponentPart
    : [eE] [+-]? Digits
    ;  

fragment EscapeSequence
    : '\\' [btnfr"'\\]
    | '\\' ([0-3]? [0-7])? [0-7]
    | '\\' 'u'+ HexDigit HexDigit HexDigit HexDigit
    ;

fragment HexDigits
    : HexDigit ((HexDigit | '_')* HexDigit)?
    ;

fragment HexDigit
    : [0-9a-fA-F]
    ;

fragment Digits
    : [0-9] ([0-9_]* [0-9])?
    ;

fragment LetterOrDigit
    : Letter
    | [0-9]
    ;

fragment Letter
    : [a-zA-Z$_] // these are the "java letters" below 0x7F
    | ~[\u0000-\u007F\uD800-\uDBFF] // covers all characters above 0x7F                 which are not a surrogate
    | [\uD800-\uDBFF] [\uDC00-\uDFFF] // covers UTF-16 surrogate pairs encodings for U+10000 to U+10FFFF
    ;

// *******************************LEXER     END****************************************

// *****************************PARSER BEGIN*****************************************

program
    : mainfunction  #Programm
    | /*EMPTY*/              #Garnichts
    ;

mainfunction
    : FUNC VARI ENTRY ORBRACKET CRBRACKET block  #NormaleHauptmethode
    ;

block
    : '{' blockStatement '}'   #CodeBlock
    | /*EMPTY*/                #EmptyCodeBlock
    ;

blockStatement
    : statement* #Befehl
    ;

statement
    : localVariableDeclaration
    | localVariableInitialization
    | localFunctionImplementation
    | localFunctionCall
    ;

expression
    : left=expression op='%'
    | left=expression op=('*' | '/') right=expression
    | left=expression op=('+' | '-') right=expression
    | neg='-' right=expression
    | number
    | IDENTIFIER
    | '(' expression ')'
    ;

number
    : DECIMAL_LITERAL
    | FLOAT_LITERAL
    ;

localFunctionImplementation
    : FUNC primitiveType IDENTIFIER ORBRACKET CRBRACKET block #Methodenimplementierung
    ;

localFunctionCall
    : IDENTIFIER ORBRACKET CRBRACKET SEMI #Methodenaufruf
    | printCommand #RetroBoxShowCommand
    ;

printCommand
    : RETROBOX POINT SHOW ORBRACKET params=primitiveLiteral CRBRACKET SEMI     #PrintCommandWP
    ;

localVariableDeclaration
    : varTypeDek=primitiveType IDENTIFIER SEMI #Variablendeklaration
    ;

localVariableInitialization
    : varTypeIni=primitiveType IDENTIFIER ASSIGN varValue=primitiveLiteral     SEMI #VariableninitKonst
    | varTypeIni=primitiveType IDENTIFIER ASSIGN varValue=expression SEMI #VariableninitExpr
    ;

primitiveLiteral
    : DECIMAL_LITERAL
    | FLOAT_LITERAL
    | STRING_LITERAL
    | CHAR_LITERAL
    ;

primitiveType
    : VARI
    | VARC
    | VARF
    | VARS
    ;

// ******************************PARSER END****************************************

Here my example-code:

func vari entry()
{
    RetroBox.show("Hallo"); //Should be recognised as print-command
}

And here a AST printed from Antlr:

AST from Compiler


Solution

  • The problem is that your RETROBOX keyword is 'retrobox' but your example code has it typed as 'RetroBox'. Antlr parses 'RetroBox' as an identifier so the following '.' is unexpected.

    Antlr should emit an error: "line 3:12 mismatched input '.' expecting '('".

    Then it attempts to recover and continue parsing. It tries single token deletion (just ignoring the '.') and finds that that works... except the rule it now matches is #Methodenaufruf instead of #RetroBoxShowCommand.