Search code examples
parsingantlrantlr4operator-precedence

Interpretation variants of binary operators


I'm writing a grammar for a language that contains some binary operators that can also be used as unary operator (argument to the right side of the operator) and for a better error recovery I'd like them to be usable as nular operators as well).
My simplified grammar looks like this:
start: code EOF ;

code:
    (binaryExpression SEMICOLON?)*
;

binaryExpression:
    binaryExpression BINARY_OPERATOR binaryExpression //TODO: check before primaryExpression
    | primaryExpression
;

    primaryExpression:
            unaryExpression
            | nularExpression
    ;

    unaryExpression:
        operator primaryExpression
        | BINARY_OPERATOR primaryExpression
    ;

    nularExpression:
        operator
        | BINARY_OPERATOR
        | NUMBER    
        | STRING
    ;

        operator:
            ID
        ;

BINARY_OPERATOR is just a set of defined keywords that are fed into the parser.
My problem is that Antlr prefers to use BINARY_OPERATORs as unary expressions (or nualr ones if there is no other choice) instead of trying to use them in a binary expression as I need it to be done.
For example consider the following intput: for varDec from one to twelve do something where from, to and do are binary operators the output of the parser is the following:
ParseTree
As you can see it interprets all binary operators as unary ones.

What I'm trying to achieve is the following: Try to match each BINARY_OPERATOR in a binary expression and only if that is not possible try to match them as a unary expression and if that isn't possible as well then it might be considered a nular expression (which can only be the case if the BINARY_OPERATORis the only content of an expression).

Has anyone an idea about how to achieve the desired behaviour?


Solution

  • Fairly standard approach is to use a single recursive rule to establish the acceptable expression syntax. ANTLR is default left associative, so op expr meets the stated unary op requirement of "argument to the right side of the operator". See, pg 70 of TDAR for a further discussion of associativity.

    Ex1: -y+x -> binaryOp{unaryOp{-, literal}, +, literal}

    Ex2: -y+-x -> binaryOp{unaryOp{-, literal}, +, unaryOp{-, literal}}

    expr
        : LPAREN expr RPAREN
        | expr op expr         #binaryOp
      //| op expr              #unaryOp   // standard formulation
        | op literal           #unaryOp   // limited formulation
        | op                   #errorOp
        | literal
        ;
    
    op  : .... ;
    
    literal
        : KEYWORD
        | ID
        | NUMBER    
        | STRING
        ;