I am trying to write a grammar using the infixNotation
(previously operatorPrecedence
), but I can't figure out how to use setResultsName
with this.
The reason I am trying to do this is that I built a grammar for boolean search queries, on top of the searchparser but it runs into a RecursionError: maximum recursion depth exceeded in comparison
for very long expressions.
So it seemed that by using the infixNotation (which the searchparser does not), I could avoid running into this error. So I am trying to adapt the grammar to infixNotation, but my evaluation heavily relies on having names of each operator in the structured parse result, and in particular, having easy access to the arguments of the operator.
I started off from the example given in the pyparsing book:
and_ = CaselessLiteral("and")
or_ = CaselessLiteral("or")
not_ = CaselessLiteral("not")
searchTerm = Word(alphanums) | quotedString.setParseAction( removeQuotes )
searchExpr = infixNotation( searchTerm,
[
(not_, 1, opAssoc.RIGHT),
(and_, 2, opAssoc.LEFT),
(or_, 2, opAssoc.LEFT),
])
so, how do I set the ParseResultName here?
If I try to set it to the operator:
or_ = CaselessLiteral("or").setResultsName("OR")
The resulting parseResult for this string ('term1 OR term2 OR term3') will look something like this:
<ITEM>
<word>
<word>
<ITEM>term1</ITEM>
</word>
<OR>or</OR>
<word>
<ITEM>term2</ITEM>
</word>
<OR>or</OR>
<word>
<ITEM>term3</ITEM>
</word>
</word>
</ITEM>
Which means that all terms and the operators are on the same level, whereas I want something like this, where the terms are arranged as arguments of the operator:
<OR>
<OR>
<word>
<ITEM>term1</ITEM>
</word>
<OR>
<word>
<ITEM>term2</ITEM>
</word>
<word>
<ITEM>term3</ITEM>
</word>
</OR>
</OR>
</OR>
I used to achieve this by something like this in my previous grammar:
operatorOr << (Group(
operatorAnd + Suppress(Keyword("OR", caseless=True)) + operatorOr
).setResultsName("OR") | operatorAnd)
but I can't figure out how to set the result name to the group made of the operator and its two arguments?
I would encourage you to consider using classes as parse actions, to build up a tree of operation nodes, as opposed to using results names.
In the code below, I attach UnOp and BinOp classes to each infixNotation operator level, which gives back instances of those classes with operator
and operands
attributes properly assigned:
class OpNode:
def __repr__(self):
return "{}({}):{!r}".format(self.__class__.__name__,
self.operator, self.operands)
class UnOp(OpNode):
def __init__(self, tokens):
self.operator = tokens[0][0]
self.operands = [tokens[0][1]]
class BinOp(OpNode):
def __init__(self, tokens):
self.operator = tokens[0][1]
self.operands = tokens[0][::2]
and_ = CaselessLiteral("and")
or_ = CaselessLiteral("or")
not_ = CaselessLiteral("not")
searchTerm = Word(alphanums) | quotedString.setParseAction(removeQuotes)
searchExpr = infixNotation(searchTerm,
[
(not_, 1, opAssoc.RIGHT, UnOp),
(and_, 2, opAssoc.LEFT, BinOp),
(or_, 2, opAssoc.LEFT, BinOp),
])
Here is a sample string showing how these nodes would be returned:
test = "term1 or term2 or term3 and term4 and not term5"
print(searchExpr.parseString(test))
Gives:
[BinOp(or):['term1', 'term2', BinOp(and):['term3', 'term4', UnOp(not):['term5']]]]
You can navigate this parsed tree and evaluate the different nodes based on their node type and operator.
Also asXML()
is not the best tool for dumping out your parsed data, you are better off using the dump()
method.