Search code examples
pythonalgorithmparsingabstract-syntax-tree

Make a while parsing function recursive


I have a basic function to parse a lisp expression. It's using a while loop, but as an exercise I'd like to convert it into a recursive function. However, it's a bit tricky for me to do. Here is what I have thus far:

def build_ast(self, tokens=None):
    # next two lines example input to make self-contained
    LEFT_PAREN, RIGHT_PAREN = '(', ')'
    tokens = ['(', '+', '2', '(', '*', '3', '4', ')', ')']
    while RIGHT_PAREN in tokens:
        right_idx = tokens.index(RIGHT_PAREN)
        left_idx = right_idx - tokens[:right_idx][::-1].index(LEFT_PAREN)-1
        extraction = [tokens[left_idx+1:right_idx],]
        tokens = tokens[:left_idx] + extraction + tokens[right_idx+1:]
    ast = tokens
    return ast

And so it would parse something like this:

(+ 2 (* 3 4))

Into this:

[['+', '2', ['*', '3', '4']]]

What would be an example of how I could make the above function recursive? So far I've started with something like:

def build_ast(self, ast=None):
    if ast is None: ast=self.lexed_tokens
    if RIGHT_PAREN not in ast:
        return ast
    else:
        right_idx = ast.index(RIGHT_PAREN)
        left_idx = right_idx - ast[:right_idx][::-1].index(LEFT_PAREN)-1
        ast = ast[:left_idx] + [ast[left_idx+1:right_idx],] + ast[right_idx+1:]
        return self.build_ast(ast)

But it just comes across as a bit strange (as if the recursion isn't helpful here). What would be a better way to construct this? Or perhaps a better/more elegant algorithm to build this simple ast?


Solution

  • You can use a recursive generator function:

    def _build_ast(tokens):
       LEFT_PAREN, RIGHT_PAREN = '(', ')'
       #consume the iterator until it is empty or a right paren occurs
       while (n:=next(tokens, None)) is not None and n != RIGHT_PAREN:
          #recursively call _build_ast if we encounter a left paren
          yield n if n != LEFT_PAREN else list(_build_ast(tokens))
       
    
    def build_ast(tokens):
       #pass tokens as an iterator to _build_ast
       return list(_build_ast(iter(tokens)))
    
    tokens = ['(', '+', '2', '(', '*', '3', '4', ')', ')']
    print(build_ast(tokens))
    

    Output:

    [['+', '2', ['*', '3', '4']]]