Search code examples
c#expressionantlrgrammar

Multiple nested expressions in ANTLR


Parser does not see equality expression: extraneous input '=' expecting {<EOF>, '~', '(', OPERATOR, IDENTIFIER, NUMBER, STRING}

Even error is not clear, it tells it expects operator, but = is a defined operator.

Also I achieve 2 member access expressions instead of 3.

This is the grammar:

grammar xxx;
 
parse: expression+ EOF; 
 
expression:
    expression op=OPERATOR expression                       #binaryExpression
    | op=OPERATOR expression                                #unaryPrefixExpression
    | expression op=OPERATOR                                #unarPostfixExpression
    | member_expression                                     #memberExpression
    | OPENING_PARENTHESIS expression CLOSING_PARENTHESIS    #parenthesisExpression 
    | STRING                                                #stringExpression
    | NUMBER                                                #numberExpression
    | NEGATE expression                                     #negationExpression
    ;

member_expression: 
    IDENTIFIER (DOT(IDENTIFIER DOT?))*
    ;

// operators
PLUS: '+' ;
MINUS: '-' ;
BIGGER_THAN: '>' ;
LESS_THAN: '<' ;
BIGGER_THAN_OR_EQUALS: '>=' ;
LESS_THAN_OR_EQUALS: '<=' ;
NEGATE: '~' ;
EQUALITY: '=' ;

OPENING_PARENTHESIS: '(' ;
CLOSING_PARENTHESIS: ')' ;

fragment LOGICAL_OPERATOR:
    | EQUALITY
    | BIGGER_THAN_OR_EQUALS 
    | LESS_THAN_OR_EQUALS
    | LESS_THAN
    | BIGGER_THAN
    ;

OPERATOR: 
    PLUS 
    | MINUS 
    | NEGATE 
    | LOGICAL_OPERATOR
    ;

DOT: '.' ;
IDENTIFIER: [a-zA-Z]+[a-zA-Z0-9_]* ;
    
// literals
NUMBER: [0-9] + ('.' [0-9] +)? ;
STRING : '"' .*? '"' ;

WS: [ \t\n]+ -> skip ;
ANY: . ;


This is the expression:

context.Previous.Output.previous_value2 = 123

Tree string:

([] ([6] ([16 6] context . Previous .)) ([6] ([16 6] Output . previous_value2)) = ([6] 123) <EOF>) `

As you can see there are 2x member access expressions, then unrecognized equality operator, then number expression.

I want to get:

3 separate member access expressions

1 equality expression


Solution

  • Lexer rules always match in 1 way: the lexer tries to match as much characters as possible and when 2 (or more) rules match the same characters, the rule defined first will win. So take the rule PLUS and OPERATOR:

    PLUS: '+' ;
    
    ...
    
    OPERATOR: 
        PLUS 
        | MINUS 
        | NEGATE 
        | LOGICAL_OPERATOR
        ;
    

    for the input string "+", the lexer will always produce a PLUS token, never a OPERATOR token.

    The solution: change the OPERATOR and LOGICAL_OPERATOR lexer rules into parser rules:

    grammar xxx;
    
    parse: expression+ EOF;
    
    expression:
        expression op=operator expression                       #binaryExpression
        | op=unary_operator expression                          #unaryPrefixExpression
        | expression op=operator                                #unarPostfixExpression
        | member_expression                                     #memberExpression
        | OPENING_PARENTHESIS expression CLOSING_PARENTHESIS    #parenthesisExpression
        | STRING                                                #stringExpression
        | NUMBER                                                #numberExpression
        | NEGATE expression                                     #negationExpression
        ;
    
    member_expression:
        IDENTIFIER (DOT(IDENTIFIER DOT?))*
        ;
    
    operator:
        EQUALITY
        | BIGGER_THAN_OR_EQUALS
        | LESS_THAN_OR_EQUALS
        | LESS_THAN
        | BIGGER_THAN
        | unary_operator
        ;
    
    unary_operator:
        PLUS
        | MINUS
        | NEGATE
        ;
    
    // operators
    PLUS: '+' ;
    MINUS: '-' ;
    BIGGER_THAN: '>' ;
    LESS_THAN: '<' ;
    BIGGER_THAN_OR_EQUALS: '>=' ;
    LESS_THAN_OR_EQUALS: '<=' ;
    NEGATE: '~' ;
    EQUALITY: '=' ;
    
    OPENING_PARENTHESIS: '(' ;
    CLOSING_PARENTHESIS: ')' ;
    
    DOT: '.' ;
    IDENTIFIER: [a-zA-Z]+[a-zA-Z0-9_]* ;
    
    // literals
    NUMBER: [0-9] + ('.' [0-9] +)? ;
    STRING : '"' .*? '"' ;
    
    WS: [ \t\n]+ -> skip ;
    ANY: . ;
    

    Also, the following would match an empty string:

    fragment LOGICAL_OPERATOR:
        | EQUALITY
        | BIGGER_THAN_OR_EQUALS 
        | LESS_THAN_OR_EQUALS
        | LESS_THAN
        | BIGGER_THAN
        ;
    

    you probably meant:

    fragment LOGICAL_OPERATOR:
        EQUALITY
        | BIGGER_THAN_OR_EQUALS 
        | LESS_THAN_OR_EQUALS
        | LESS_THAN
        | BIGGER_THAN
        ;
    

    Btw, a better way to print the tree is to use toStringTree(Parser):

    String source = "context.Previous.Output.previous_value2 = 123";
    xxxLexer lexer = new xxxLexer(CharStreams.fromString(source));
    xxxParser parser = new xxxParser(new CommonTokenStream(lexer));
    ParseTree tree = parser.parse();
    System.out.println(tree.toStringTree(parser));
    

    which will print:

    (parse (expression (member_expression context . Previous .)) (expression (expression (member_expression Output . previous_value2)) (operator =) (expression 123)) <EOF>)