Search code examples
pythonregexpython-2.7pyparsing

Python Parsing Expression and replacing with another expression


I am using pyparsing, and to parse some text, I created a grammar, and it works as expected, however, for a expression like this one:

OR(OR(in1, in2), in3)

I want to replace the nested expression, to an "alias" and then create an expression for this alias, in simple words:

# I have this expression ( OR(OR(in1, in2), in3) )
# Which I parsed to
parsed = ["OR", [["OR", ["in1", "in2"]], "in3"]]

# I want to have
exp1 = ["OR", ["in1", "in2"]]
exp2 = ["OR", ["exp1", "in3"]]

This is a minimal example, however I can have any nested "expressions" (with only two arguments). Is there a way to do this?


Solution

  • Here is a parser that is probably similar to the one you wrote:

    import pyparsing as pp
    
    LPAR, RPAR = map(pp.Suppress, "()")
    OR = pp.Keyword("OR")
    term = pp.pyparsing_common.identifier
    
    or_expr = pp.Forward()
    or_expr <<= pp.Group(OR + pp.Group(LPAR + pp.delimitedList(or_expr | term)) + RPAR)
    

    When it parses the string you gave, it provides the same nested output.

    To create the "expN" expression names, you can use a parse action to gather up the expressions, and associated expression id, in a global list var:

    # add parse action to convert OR's to exprs
    exprs = []
    def generate_expr_definition(tokens):
        expr_name = "exp{}".format(len(exprs)+1)
        exprs.append((expr_name, tokens.asList()[0]))
        return expr_name
    or_expr.addParseAction(generate_expr_definition)
    

    When you run this parser, the created results aren't the important part. What is important is the exprs list that was built while parsing:

    or_expr.parseString(sample)
    
    # generate assignments for each nested OR expr
    for name, expr in exprs:
        print("{} = {}".format(name, expr))
    

    This gives:

    exp1 = ['OR', ['in1', 'in2']]
    exp2 = ['OR', ['exp1', 'in3']]
    

    Now I look at that, and ask, "how will I know the difference between 'exp1' that was parsed from the input vs. 'exp1' that is supposed to represent a parsed expression. If this is to be interpreted as a Python assignment, it should really read:

    exp2 = ['OR', [exp1, 'in3']]
    

    with no quotes around the variable name.

    To do this, we need to return an object from the parse action that will repr as the name without the surrounding quotes. Like this:

    class ExprName:
        def __init__(self, name):
            self._name = name
        def __repr__(self):
            return self._name
    

    Change the return statement in the parse action to:

    return ExprName(expr_name)
    

    And the resulting output now looks like:

    exp1 = ['OR', ['in1', 'in2']]
    exp2 = ['OR', [exp1, 'in3']]
    

    Now you can distinguish the generated expN vars from parsed inputs.