Search code examples
pythonparsinggrammarambiguous-grammarlark-parser

How can I make this grammar unambiguous?


I'm trying to write a parser for a simple language:

parser = Lark("""
    ?start: arithmetic_expr | boolean_expr

    // relational operation
    ?rel_op : arithmetic_expr ("<" | ">" | "==" | "!=") arithmetic_expr 

    // boolean expression
    ?boolean_expr: bool_term
            | boolean_expr "or" bool_term
    ?bool_term: bool_atom
                | bool_term "and" bool_atom
    ?bool_atom: "true"
                | "false"
                | "!" bool_atom
                | rel_op
                | ID
                | "(" boolean_expr ")"

    // arithmetic expression
    ?arithmetic_expr: product
        | arithmetic_expr "+" product
        | arithmetic_expr "-" product
    ?product: atom
        | product "*" atom
        | product "/" atom
    ?atom: NUMBER
            | "-" atom
            | ID
            | "(" arithmetic_expr ")"


    %import common.CNAME -> ID
    %import common.NUMBER
    
    """, parser='lalr', start='start')

When I run it, I get the following error:

lark.exceptions.GrammarError: Reduce/Reduce collision in Terminal('RPAR') between the following rules: 
                - <bool_atom : ID>
                - <atom : ID>

I understand that this is happening because, if we imagine that the parser was built without an error, and then wrote parser.parse('foo'), both arithmetic_expr and boolean_expr would be "correct" derivations. Also, as you can see I'm using LALR which is a strictly deterministic algorithm and can't handle ambiguities.

So my question is, how can I make this grammar unambiguous? I can't seem to find the solution.


Solution

  • You can't and you shouldn't.

    Don't try to use a grammar to do type-checking. Types are semantic, not syntactic. The lexicon can't tell you whether an ID is boolean or arithmetic (unless you enforce Hungarian naming), so the grammar can only tell "sometimes". And sometimes isn't good enough.

    But it doesn't matter. You can easily do the type analysis during the semantic passes after you've built the syntax tree. Until then, an expression is an expression.

    What I'd do is get rid of bool_atom. Just use a complete expression hierarchy with (boolean) expression at the top and atom at the bottom, placing rel_op where it would naturally go (in bool_term, instead of bool_atom). However, that does change the grammar in one way. In the existing grammar, the expression

    !a < b
    

    means !(a < b). That may be what you expected, and if so you can accommodate it with a bit of work, but it's a bit different from the semantics of most languages I know. My suggested grammar would require the use of parentheses in that case.