Search code examples
pythonkeywordpyparsinginfix-notation

How do you stop infix_notation from matching the base expression when there are no operations (pyparsing)?


I am trying to parse expressions with pyparsing, and can do that with infix_notation, but the problem is that it matches lines that have no operations, and just match the base_expr argument. This is a problem because valid keywords can be matched by the base_expr.

I use this as the infix_notation

expression = infix_notation(Word(
    printables,
    exclude_chars="** ~ + - * / % & | ^ != == <= >= < > ! , += -= *= /= %= <<= >>= &= |= ^="
),
                            [
                                ("**", 2, OpAssoc.LEFT),
                                (one_of("~ + -"), 1, OpAssoc.RIGHT),
                                (one_of("* / % *= /= %="), 2, OpAssoc.LEFT),
                                (one_of("<< >> <<= >>="), 2, OpAssoc.LEFT),
                                (one_of("& | ^ &= |= ^="), 2, OpAssoc.LEFT),
                                (one_of("+ - += -="), 2, OpAssoc.LEFT),
                                (one_of("!= == <= >= < >"), 2, OpAssoc.LEFT),
                                (one_of("&& ||"), 2, OpAssoc.LEFT),
                                ("!", 1, OpAssoc.RIGHT),
                            ])

The problem match is this

Word(
    printables,
    exclude_chars="** ~ + - * / % & | ^ != == <= >= < > ! , += -= *= /= %= <<= >>= &= |= ^="
)

So this would match the keyword "else" which I do not want, but it also needs to match variables in an expression like "else1 += else2".

How would you do this?


Solution

  • A common way to differentiate keywords from identifiers is to define an expression for any keyword like this (get the list of all Python keywords, but you can define your own list):

    from keyword import kwlist
    any_keyword = pp.one_of(kwlist, as_keyword=True)
    
    infix_term = Word(
        printables,
        exclude_chars="** ~ + - * / % & | ^ != == <= >= < > ! , += -= *= /= %= <<= >>= &= |= ^="
    )
    
    operand = ~any_keyword + infix_term
    
    expression = infix_notation(operand, 
        ... etc. ...
    

    Note that your Word(printables, ...) expression for an infix_term will match almost anything, including ......, integers, floats, etc. Also, the exclude_chars argument does not split the string into operators, but just uses all the chars in the string. So you would not be able to use "-10" as a term, since "-" is in the set of exclude_chars. So give a little more thought as to how to best define your operands.

    Lastly, your infix_notation list of operators is pretty long, and this will be a sloooooooowwwww parser if you don't enable packrat parsing (using ParserElement.enable_packrat().