Search code examples
pythonparsingpyparsingamazon-s3-select

Parse expression with binary and unary operators, reserved words, and without parentheses


I'm trying to parse expressions made of the binary operator +, the unary operator not and identifiers that can be any alphabetical string that isn't not

from pyparsing import (
    CaselessKeyword,
    Combine,
    Word,
    alphas,
    opAssoc,
    infixNotation,
)

identifier = Combine(~CaselessKeyword('not') + Word(alphas))
expression = infixNotation(identifier, [
  ('+', 2, opAssoc.LEFT),
  (CaselessKeyword('not'), 1, opAssoc.RIGHT),
]

Running

expression.parseString('a + (not b)')

gives what I expect

[['a', '+', ['not', 'b']]]

However, without the parentheses

expression.parseString('a + not b')

I only get the first token:

['a']

How can I define the language to work as I would like without the parentheses?

(In the real case there are more operators and reserved words: this is a step towards parsing the S3 Select language)


Solution

  • In S3 NOT is higher that AND:

    Operator Precedence The following table shows the operators' precedence in decreasing order.

    (from S3 amazon site).

    In that table NOT is above AND.

    So your code should be:

    identifier = Combine(~CaselessKeyword('not') + Word(alphas))
    expression = infixNotation(identifier, [
        (CaselessKeyword('not'), 1, opAssoc.RIGHT),
        ('+', 2, opAssoc.LEFT),
    ])
    

    BTW - If "NOT is listed as a lower than the binary +", than a + not b is not valid expression. + needs two operators: one is a, but not b is not valid operand.

    BTW2 (from comments): Please don't mix + which is an arithmetic operator and NOT which is a logic operator in the same expression. 1 + not 2 is not a valid expression. Every language decide how to parse that's kinds of strange expressions.