Search code examples
pythonstringparsingrecursionrecursive-descent

Recursive parsing for lisp like syntax


im trying to parse lines in the form:

(OP something something (OP something something ) ) ( OP something something )

Where OP is a symbol for a logical gate (AND, OR, NOT) and something is the thing i want to evaluate.

The output im looking for is something like:

{ 'OPERATOR': [condition1, condition2, .. , conditionN] }

Where a condition itself can be a dict/list pair itself (nested conditions). So far i tried something like:

        tree = dict()
        cond = list()
        tree[OP] = cond


    for string in conditions:
        self.counter += 1
        if string.startswith('('):
            try:
                OP = string[1]
            except IndexError:
                OP = 'AND'
            finally:
                if OP == '?':
                    OP = 'OR'
                elif OP == '!':
                    OP = 'N'

            # Recurse
            cond.append(self.parse_conditions(conditions[self.counter:], OP))
            break

        elif not string.endswith(")"):
            cond.append(string)

        else:
            return tree

    return tree

I tried other ways aswell but i just can't wrap my head around this whole recursion thing so im wondering if i could get some pointers here, i looked around the web and i found some stuff about recursive descent parsing but the tutorials were all trying to do something more complicated than i needed.

PS: i realize i could do this with existing python libraries but what would i learn by doing that eh?


Solution

  • I'm posting this without further comments, for learning purposes (in the real life please do use a library). Note that there's no error checking (a homework for you!)

    Feel free to ask if there's something you don't understand.

    # PART 1. The Lexer
    
    symbols = None
    
    def read(input):
        global symbols
        import re
        symbols = re.findall(r'\w+|[()]', input)
    
    def getsym():
        global symbols
        return symbols[0] if symbols else None
    
    def popsym():
        global symbols
        return symbols.pop(0)
    
    # PART 2. The Parser
    # Built upon the following grammar:
    #  
    #     program = expr*
    #     expr    = '(' func args ')'
    #     func    = AND|OR|NOT
    #     args    = arg*
    #     arg     = string|expr
    #     string  = [a..z]
    
    def program():
        r = []
        while getsym():
            r.append(expr())
        return r
    
    def expr():
        popsym() # (
        f = func()
        a = args()
        popsym() # )
        return {f: a}
    
    def func():
        return popsym()
    
    def args():
        r = []
        while getsym() != ')':
            r.append(arg())
        return r
    
    def arg():
        if getsym() == '(':
            return expr()
        return string()
    
    def string():
        return popsym()
    
    # TEST = Lexer + Parser
    
    def parse(input):
        read(input)
        return program()
    
    print parse('(AND a b (OR c d)) (NOT foo) (AND (OR x y))')
    # [{'AND': ['a', 'b', {'OR': ['c', 'd']}]}, {'NOT': ['foo']}, {'AND': [{'OR': ['x', 'y']}]}]