Search code examples
pythonpython-3.xparsingpyparsing

Pyparsing Precedence breaks with Unary operator


I'm trying to implement a subset of Python's operators for arithmetic parsing using pyparsing. I have the following code implementing my parser:

variable_names = pyparsing.Combine(pyparsing.Literal('$') + pyparsing.Word(pyparsing.alphanums + '_'))
integer = pyparsing.Word(pyparsing.nums)
double = pyparsing.Combine(pyparsing.Word(pyparsing.nums) + '.' + pyparsing.Word(pyparsing.nums))
parser = pyparsing.operatorPrecedence(variable_names | double | integer, [
                                ('**', 2, pyparsing.opAssoc.RIGHT),
                                ('-', 1, pyparsing.opAssoc.RIGHT),
                                (pyparsing.oneOf('* / // %'), 2, pyparsing.opAssoc.LEFT),
                                (pyparsing.oneOf('+ -'), 2, pyparsing.opAssoc.LEFT),
                                (pyparsing.oneOf('> >= < <= == !='), 2, pyparsing.opAssoc.LEFT),
                                ('not', 1, pyparsing.opAssoc.RIGHT),
                                ('and', 2, pyparsing.opAssoc.LEFT),
                                ('or', 2, pyparsing.opAssoc.LEFT)])

For the most part, this works fine, although sometimes it breaks when I use the unary -. Specifically, I think (I may be wrong) it breaks if I use - after higher precedence operands, which in this case is just **. The following examples show the issue:

parsing 5 * 10 * -2             yields: ['5', '*', '10', '*', ['-', '2']]
parsing 5 * 10 ** -2            yields: ['5', '*', '10']               # Wrong
parsing 5 * 10 ** (-2)          yields: ['5', '*', ['10', '**', ['-', '2']]]
parsing 5 and not 8             yields: ['5', 'and', ['not', '8']]
parsing 5 and - 8               yields: ['5', 'and', ['-', '8']]

Is there any reason why this is happening? What am I missing?


Solution

  • As for me you should define - as higher then **

    ('-', 1, pyparsing.opAssoc.RIGHT),
    ('**', 2, pyparsing.opAssoc.RIGHT),
    

    and this should resolve your problem.


    Minimal working code

    import pyparsing
    
    variable_names = pyparsing.Combine(pyparsing.Literal('$') + pyparsing.Word(pyparsing.alphanums + '_'))
    
    integer = pyparsing.Word(pyparsing.nums)
    
    double = pyparsing.Combine(pyparsing.Word(pyparsing.nums) + '.' + pyparsing.Word(pyparsing.nums))
    
    parser = pyparsing.operatorPrecedence(
                variable_names | double | integer,
                [
                    ('-',  1, pyparsing.opAssoc.RIGHT),
                    ('**', 2, pyparsing.opAssoc.RIGHT),
                    (pyparsing.oneOf('* / // %'), 2, pyparsing.opAssoc.LEFT),
                    (pyparsing.oneOf('+ -'), 2, pyparsing.opAssoc.LEFT),
                    (pyparsing.oneOf('> >= < <= == !='), 2, pyparsing.opAssoc.LEFT),
                    ('not', 1, pyparsing.opAssoc.RIGHT),
                    ('and', 2, pyparsing.opAssoc.LEFT),
                    ('or',  2, pyparsing.opAssoc.LEFT)
                ]
            )
    
    examples = [
        "5 * 10 ** -2",
        "5 * 10 * -2",
        "5 * 10 ** (-2)",
        "5 * -10 ** 2",
        "5 * (-10) ** 2",    
        "5 and not 8",
        "5 and -8",
        "1 ** -2",
        "-1 ** 2",
    ]
    
    longest = max(map(len, examples))
    
    for ex in examples:
        result = parser.parseString(ex)
        print(f'{ex:{longest}}  <=>  {result}')
    

    Results:

    5 * 10 ** -2    <=>  [['5', '*', ['10', '**', ['-', '2']]]]
    5 * 10 * -2     <=>  [['5', '*', '10', '*', ['-', '2']]]
    5 * 10 ** (-2)  <=>  [['5', '*', ['10', '**', ['-', '2']]]]
    5 * -10 ** 2    <=>  [['5', '*', [['-', '10'], '**', '2']]]
    5 * (-10) ** 2  <=>  [['5', '*', [['-', '10'], '**', '2']]]
    5 and not 8     <=>  [['5', 'and', ['not', '8']]]
    5 and -8        <=>  [['5', 'and', ['-', '8']]]
    1 ** -2         <=>  [['1', '**', ['-', '2']]]
    -1 ** 2         <=>  [[['-', '1'], '**', '2']]
    

    BTW: for comparision: C Operator Precedence and Python - Operator precedence


    EDIT:

    I can get -500 for 5 * -10 ** 2 ([[5, '*', ['-', [10, '**', 2]]]]) when I keep ** before - but I use

    integer = pyparsing.pyparsing_common.signed_integer
    

    import pyparsing
    
    variable_names = pyparsing.Combine(pyparsing.Literal('$') + pyparsing.Word(pyparsing.alphanums + '_'))
    
    #integer = pyparsing.Word(pyparsing.nums)
    integer = pyparsing.pyparsing_common.signed_integer
    
    double = pyparsing.Combine(pyparsing.Word(pyparsing.nums) + '.' + pyparsing.Word(pyparsing.nums))
    
    parser = pyparsing.operatorPrecedence(
                variable_names | double | integer,
                [
                    ('**', 2, pyparsing.opAssoc.RIGHT),
                    ('-',  1, pyparsing.opAssoc.RIGHT),
                    (pyparsing.oneOf('* / // %'), 2, pyparsing.opAssoc.LEFT),
                    (pyparsing.oneOf('+ -'), 2, pyparsing.opAssoc.LEFT),
                    (pyparsing.oneOf('> >= < <= == !='), 2, pyparsing.opAssoc.LEFT),
                    ('not', 1, pyparsing.opAssoc.RIGHT),
                    ('and', 2, pyparsing.opAssoc.LEFT),
                    ('or',  2, pyparsing.opAssoc.LEFT)
                ]
            )
    
    examples = [
        "5 * 10 ** -2",
        "5 * 10 * -2",
        "5 * 10 ** (-2)",
        "5 * -10 ** 2",
        "5 * (-10) ** 2",    
        "5 and not 8",
        "5 and -8",
        "1 ** -2",
        "-1 ** 2",
    ]
    
    longest = max(map(len, examples))
    
    for ex in examples:
        result = parser.parseString(ex)
        print(f'{ex:{longest}}  <=>  {result}')
    

    Result:

    5 * 10 ** -2    <=>  [[5, '*', [10, '**', -2]]]
    5 * 10 * -2     <=>  [[5, '*', 10, '*', ['-', 2]]]
    5 * 10 ** (-2)  <=>  [[5, '*', [10, '**', ['-', 2]]]]
    5 * -10 ** 2    <=>  [[5, '*', ['-', [10, '**', 2]]]]
    5 * (-10) ** 2  <=>  [[5, '*', [['-', 10], '**', 2]]]
    5 and not 8     <=>  [[5, 'and', ['not', 8]]]
    5 and -8        <=>  [[5, 'and', ['-', 8]]]
    1 ** -2         <=>  [[1, '**', -2]]
    -1 ** 2         <=>  [['-', [1, '**', 2]]]
    

    Doc for pyparsing_common with other predefined expressions