Search code examples
python-3.xpyparsing

Parse mathematical expressions with pyparsing


I'm trying to parse a mathematical expression using pyparsing. I know i could just copy the example calculator from pyparsing site, but i want to understand it so i can add to it later. And i'm here because i tried to understand the example, and i couldn't, so i tried my best, and i got to this:

symbol = (
    pp.Literal("^") |
    pp.Literal("*") |
    pp.Literal("/") |
    pp.Literal("+") |
    pp.Literal("-")
)
operation = pp.Forward()
atom = pp.Group(
    pp.Literal("(").suppress() + operation + pp.Literal(")").suppress()
) | number
operation << (pp.Group(number + symbol + number + pp.ZeroOrMore(symbol + atom)) | atom)
expression = pp.OneOrMore(operation)


print(expression.parseString("9-1+27+(3-5)+9"))

That prints:

[[9, '-', 1, '+', 27, '+', [[3, '-', 5]], '+', 9]]

It works, kinda. I want precedence and all sorted into Groups, but after trying a lot, i couldn't find a way to do it. More or less like this:

[[[[9, '-', 1], '+', 27], '+', [3, '-', 5]], '+', 9]

I want to keep it AST-looking, i would like to generate code from it.

I did saw the operatorPrecedence class? similar to Forward, but i don't think i understand how it works either.

EDIT:

Tried more in depth operatorPrecedence and i got this:

expression = pp.operatorPrecedence(number, [
    (pp.Literal("^"), 1, pp.opAssoc.RIGHT),
    (pp.Literal("*"), 2, pp.opAssoc.LEFT),
    (pp.Literal("/"), 2, pp.opAssoc.LEFT),
    (pp.Literal("+"), 2, pp.opAssoc.LEFT),
    (pp.Literal("-"), 2, pp.opAssoc.LEFT)
])

Which doesn't handle parenthesis (i don't know if i will have to postprocess the results) and i need to handle them.


Solution

  • The actual name for this parsing problem is "infix notation" (and in recent versions of pyparsing, I am renaming operatorPrecedence to infixNotation). To see the typical implementation of infix notation parsing, look at the fourFn.py example on the pyparsing wiki. There you will see an implementation of this simplified BNF to implement 4-function arithmetic, with precedence of operations:

    operand :: integer or real number
    factor :: operand | '(' expr ')'
    term :: factor ( ('*' | '/') factor )*
    expr :: term ( ('+' | '-') term )*
    

    So an expression is one or more terms separated by addition or subtraction operations.

    A term is one or more factors separated by multiplication or division operations.

    A factor is either a lowest-level operand (in this case, just integers or reals), OR an expr enclosed in ()'s.

    Note that this is a recursive parser, since factor is used indirectly in the definition of expr, but expr is also used to define factor.

    In pyparsing, this looks roughly like this (assuming that integer and real have already been defined):

    LPAR,RPAR = map(Suppress, '()')
    expr = Forward()
    operand = real | integer
    factor = operand | Group(LPAR + expr + RPAR)
    term = factor + ZeroOrMore( oneOf('* /') + factor )
    expr <<= term + ZeroOrMore( oneOf('+ -') + term )
    

    Now using expr, you can parse any of these:

    3
    3+2
    3+2*4
    (3+2)*4
    

    The infixNotation pyparsing helper method takes care of all the recursive definitions and groupings, and lets you define this as:

    expr = infixNotation(operand,
            [
            (oneOf('* /'), 2, opAssoc.LEFT),
            (oneOf('+ -'), 2, opAssoc.LEFT),
            ])
    

    But this obscures all the underlying theory, so if you are trying to understand how this is implemented, look at the raw solution in fourFn.py.

    [EDIT - 18 Dec 2022] For those looking for a pre-defined solution, I've packaged infixNotation up into its own pip-installable package called plusminus. plusminus defines a BaseArithmeticParser class for creating a ready-to-run parser and evaluator that supports these operators:

      **   ÷   >=  ∈  in   ?:
      *    +   ==  ∉  not  |absolute-value|
      //   -   !=  ∩  and
      /    <   ≠   ∪  ∧
      mod  >   ≤   &  or
      ×    <=  ≥   |  ∨
    

    And these functions:

      abs    ceil   max
      round  floor  str
      trunc  min    bool
    

    The BaseArithmeticParser class allows you to define additional operators and functions for your own domain-specific expressions, and the examples show how to define parsers with custom functions and operators for dice rolling, retail price discounts, among others.