Search code examples
pythonparsingtokenright-to-left

Parsing right-associative operator (exponents)


I've been writing an lexer/parser/interpreter for my own language and so far all has been working. I've been following the examples over at Ruslan Spivak's blog (Github link to each article).

I wanted to extend my language grammar past what is written in the articles to include more operators like comparisons (<, >=, etc.) and also exponents (** or ^ in my language). I have this grammar:

expression        : exponent ((ADD | SUB) exponent)*
exponent          : term ((POWER) term)*
# this one is right-associative (powers **)

term              : comparison ((MUL | DIV) comparison)*
comparison        : factor ((EQUAl   | L_EQUAL | LESS
                             N_EQUAL | G_EQUAL | GREATER) factor)*
# these are all binary operations



factor            : NUM | STR        | variable
                        | ADD factor | SUB factor
                        | LPAREN expr RPAREN
# different types of 'base' types like integers
# also contains parenthesised expressions which are evalutaed first

In terms of parsing tokens, I use the same method as used in Ruslan's blog. Here is one that will parse the exponent line, which handles addition and subtraction despite its name, as the grammar says that expressions are parsed as exponent_expr (+ / -) exponent_expr

def exponent(self):
    node = self.term()
    while self.current_token.type in (ADD, SUB):
        token = self.current_token

        if token.type == ADD:
            self.consume_token(ADD)
        elif token.type == SUB:
            self.consume_token(SUB)

        node = BinaryOperation(left_node=node,
                               operator=token,
                               right_node=self.term())

    return node

Now this parses left-associative tokens just fine (since the token stream comes left to right naturally), but I am stuck on how to parse right-associative exponents. Look at this expected in/out for reference:

>>> 2 ** 3 ** 2
# should be parsed as...
>>> 2 ** (3 ** 2)
# which is...
>>> 2 ** 9
# which returns...
512

# Mine, at the moment, parses it as...
>>> (2 ** 3) ** 2
# which is...
>>> 8 ** 2
# which returns...
64

To solve this, I tried switching the BinaryOperation() constructor's left and right nodes to make the current node the right and the new node the left, but this just makes 2**5 parse as 5**2 which gives me 25 instead of the expected 32.

Any approaches that I could try?


Solution

  • The fact that your exponent function actually parses expressions should have been a red flag. In fact, what you need is an expression function which parses expressions and an exponent function which parses exponentiations.

    You've also mixed up the precedences of exponentiation and multiplication (and other operations), because 2 * x ** 4 does not mean (2 * x) ** 4 (which would be 16x⁴), but rather 2 * (x ** 4). By the same token, x * 3 < 17 does not mean x * (3 < 17), which is how your grammar will parse it.

    Normally the precedences for arithmetics look something like this:

     comparison     <, <=, ==, ... ( lowest precedence)
     additive       +, -
     multiplicative *, /, %
     unary          +, -
     exponentiation **
     atoms          numbers, variables, parenthesized expressions, etc.
    

    (If you had postfix operators like function calls, they would go in between exponentiation and atoms.)

    Once you've reworked your grammar in this form, the exponent parser will look something like this:

    def exponent(self):
        node = self.term()
        while self.current_token.type is POWER:
            self.consume_token(ADD)
            node = BinaryOperation(left_node=node,
                                   operator=token,
                                   right_node=self.exponent())
        return node
    

    The recursive call at the end produces right associativity. In this case recursion is acceptable because the left operand and the operator have already been consumed. Thus the recursive call cannot produce an infinite loop.