I'm trying to use pyparsing==2.4.7
to parse search queries that have a field:value
format.
Examples of the strings I want to parse include:
field1:value1
field1:value1 field2:value2
field1:value1 AND field2:value2
(field1:value1a OR field1:value1b) field2:value2
(field1:value1a | field1:value1b) & (field2:value2a | field2:value2b)
A few things to note:
OR
and |
to both mean "OR", same with AND
and &
meaning the same thingAND
is implied:
) will never have spacesI have written a parser that works (code is based on this SO answer), but only for when all of the operators are present (AND
and OR
):
import pyparsing as pp
from pyparsing import Word, alphas, alphanums, White, Combine, OneOrMore, Literal, oneOf
field_name = Word(alphanums).setResultsName('field_name')
search_value = Word(alphanums + '-').setResultsName('search_value')
operator = Literal(':')
query = field_name + operator + search_value
AND = oneOf(['AND', 'and', '&', ' '])
OR = oneOf(['OR', 'or', '|'])
NOT = oneOf(['NOT', 'not', '!'])
query_expr = pp.infixNotation(query, [
(NOT, 1, pp.opAssoc.RIGHT, ),
(AND, 2, pp.opAssoc.LEFT, ),
(OR, 2, pp.opAssoc.LEFT, ),
])
class ComparisonExpr:
def __init__(self, tokens):
self.tokens = tokens
def __str__(self):
return "Comparison:('field': {!r}, 'operator': {!r}, 'value': {!r})".format(*self.tokens)
def __repr__(self):
return self.__str__()
query.addParseAction(ComparisonExpr)
sample = "(field1:value1a | field1:value1b) & (field2:value2a | field2:value2b)"
result = query_expr.parseString(sample).asList()
from pprint import pprint
>>> pprint(result)
[[[Comparison:('field': 'field1', 'operator': ':', 'value': 'value1a'),
'|',
Comparison:('field': 'field1', 'operator': ':', 'value': 'value1b')],
'&',
[Comparison:('field': 'field2', 'operator': ':', 'value': 'value2a'),
'|',
Comparison:('field': 'field2', 'operator': ':', 'value': 'value2b')]]]
However, if I try it with a sample
that is missing a operator, the parser appears to stop at the point where an operator would be expected:
sample = "(field1:value1a | field1:value1b) (field2:value2a | field2:value2b)"
result = query_expr.parseString(sample).asList()
from pprint import pprint
pprint(result)
[[Comparison:('field': 'field1', 'operator': ':', 'value': 'value1a'),
'|',
Comparison:('field': 'field1', 'operator': ':', 'value': 'value1b')]]
Is there a way to make whitespace an "implicit AND
" if there is no operator separating terms?
Short answer:
Replace your definition of AND
with:
AND = oneOf(['AND', 'and', '&']) | pp.Empty()
Some other suggestions:
For easier post-parse processing, you may want the Empty()
to actually emit a "&" operator. You can do that with a parse action:
AND = oneOf(['AND', 'and', '&']) | pp.Empty().addParseAction(lambda: "&")
In fact, you can normalize all your operators to just "&", "|", and "!", again, to skip any "if operator == 'AND' or operator == 'and' or ..." code. Put your parse action on the whole expression:
AND = (oneOf(['AND', 'and', '&']) | pp.Empty()).addParseAction(lambda: "&")
OR = oneOf(['OR', 'or', '|']).addParseAction(lambda: "|")
NOT = oneOf(['NOT', 'not', '!']).addParseAction(lambda: "!")
Also, considering that you are now accepting "" as equivalent to "&", you should make pyparsing treat your operators like keywords - so there is no confusion if "oregon" is not "or egon". Add the asKeyword
argument to all your oneOf
expressions:
AND = (oneOf(['AND', 'and', '&'], asKeyword=True)
| pp.Empty()).addParseAction(lambda: "&")
OR = oneOf(['OR', 'or', '|'], asKeyword=True).addParseAction(lambda: "|")
NOT = oneOf(['NOT', 'not', '!'], asKeyword=True).addParseAction(lambda: "!")
Lastly, when you want to write test strings, you can skip the looping over strings, or catching ParseExceptions - just use runTests
:
query_expr.runTests("""\
(field1:value1a | field1:value1b) & (field2:value2a | field2:value2b)
(field1:value1a | field1:value1b) (field2:value2a | field2:value2b)
""")
Will print each test string, followed by the parsed results or the parse exception and '^' where the exception occurred:
(field1:value1a | field1:value1b) & (field2:value2a | field2:value2b)
[[[Comparison:('field': 'field1', 'operator': ':', 'value': 'value1a'), '|', Comparison:('field': 'field1', 'operator': ':', 'value': 'value1b')], '&', [Comparison:('field': 'field2', 'operator': ':', 'value': 'value2a'), '|', Comparison:('field': 'field2', 'operator': ':', 'value': 'value2b')]]]
[0]:
[[Comparison:('field': 'field1', 'operator': ':', 'value': 'value1a'), '|', Comparison:('field': 'field1', 'operator': ':', 'value': 'value1b')], '&', [Comparison:('field': 'field2', 'operator': ':', 'value': 'value2a'), '|', Comparison:('field': 'field2', 'operator': ':', 'value': 'value2b')]]
[0]:
[Comparison:('field': 'field1', 'operator': ':', 'value': 'value1a'), '|', Comparison:('field': 'field1', 'operator': ':', 'value': 'value1b')]
[1]:
&
[2]:
[Comparison:('field': 'field2', 'operator': ':', 'value': 'value2a'), '|', Comparison:('field': 'field2', 'operator': ':', 'value': 'value2b')]
(field1:value1a | field1:value1b) (field2:value2a | field2:value2b)
[[[Comparison:('field': 'field1', 'operator': ':', 'value': 'value1a'), '|', Comparison:('field': 'field1', 'operator': ':', 'value': 'value1b')], '&', [Comparison:('field': 'field2', 'operator': ':', 'value': 'value2a'), '|', Comparison:('field': 'field2', 'operator': ':', 'value': 'value2b')]]]
[0]:
[[Comparison:('field': 'field1', 'operator': ':', 'value': 'value1a'), '|', Comparison:('field': 'field1', 'operator': ':', 'value': 'value1b')], '&', [Comparison:('field': 'field2', 'operator': ':', 'value': 'value2a'), '|', Comparison:('field': 'field2', 'operator': ':', 'value': 'value2b')]]
[0]:
[Comparison:('field': 'field1', 'operator': ':', 'value': 'value1a'), '|', Comparison:('field': 'field1', 'operator': ':', 'value': 'value1b')]
[1]:
&
[2]:
[Comparison:('field': 'field2', 'operator': ':', 'value': 'value2a'), '|', Comparison:('field': 'field2', 'operator': ':', 'value': 'value2b')]