I wrote a simple parser for parsing simple statement with AND/OR/NOT operations.
import pyparsing
class ClauseExpression:
def __init__(self, tokens):
self.tokens = tokens
def __repr__(self):
return "field: {}, op: {}, value: {}".format(*self.tokens)
def asDict(self):
return self.tokens.asDict()
clause = (
pyparsing.oneOf("group tag").setResultsName("field")
+ pyparsing.Literal("=").setResultsName("op")
+ pyparsing.Word(pyparsing.alphanums).setResultsName("value")
).setParseAction(ClauseExpression)
class OpNode:
def __repr__(self):
return "{}:{!r}".format(self.op, self.items)
class UnOp(OpNode):
def __init__(self, tokens):
self.op = tokens[0][0]
self.items = [tokens[0][1]]
class BinOp(OpNode):
def __init__(self, tokens):
self.op = tokens[0][1]
self.items = tokens[0][::2]
statement = pyparsing.infixNotation(
clause,
[
(pyparsing.CaselessLiteral("NOT"), 1, pyparsing.opAssoc.RIGHT, UnOp),
(pyparsing.CaselessLiteral("AND"), 2, pyparsing.opAssoc.LEFT, BinOp),
(pyparsing.CaselessLiteral("OR"), 2, pyparsing.opAssoc.LEFT, BinOp),
],
)
def parse_result(result):
if isinstance(result, ClauseExpression):
return result.asDict()
elif isinstance(result, OpNode):
return {
"op": result.op,
"items": [parse_result(item) for item in result.items],
}
raise TypeError("Oh Noe! Something is not right.")
if __name__ == "__main__":
tests = (
"tag=TagA",
"tag=TagA AND tag=TagB",
"tag=TagA OR NOT tag=TagB",
)
for item in tests:
print("=" * 40)
print("INPUT:", item)
results = statement.parseString(item)
print("OUTPUT:", parse_result(results[0]))
print("=" * 40)
Output
========================================
INPUT: tag=TagA
OUTPUT: {'field': 'tag', 'op': '=', 'value': 'TagA'}
========================================
========================================
INPUT: tag=TagA AND tag=TagB
OUTPUT: {'op': 'AND', 'items': [{'field': 'tag', 'op': '=', 'value': 'TagA'}, {'field': 'tag', 'op': '=', 'value': 'TagB'}]}
========================================
========================================
INPUT: tag=TagA OR NOT tag=TagB
OUTPUT: {'op': 'OR', 'items': [{'field': 'tag', 'op': '=', 'value': 'TagA'}, {'op': 'NOT', 'items': [{'field': 'tag', 'op': '=', 'value': 'TagB'}]}]}
========================================
With these outputs I could dynamically generate complex lookups with django Q objects.
However the I dont know how to handle whitespaces and symbols (e.g "&")
For INPUT => tag=Tag C AND tag=E&F
OUTPUT => {'field': 'tag', 'op': '=', 'value': 'Tag'}
I understand whitespaces are ignored by default for pyparsing.
Somehow I need make use of pyparsing.White()
and exclude AND/OR/NOT e.g pyparsing.NotAny(pyparsing.Or(['AND', 'OR', 'NOT']))
Im a complete noob with pyparsing and any help will be appreciated.
Pretty good adapting m1lton's blog post to your specialized form.
Using runTests to run your failing test string gives this:
tag=Tag C AND tag=E&F
^
FAIL: Expected end of text, found 'C' (at char 8), (line:1, col:9)
indicating that the multi-word tag value is not being parsed correctly.
Usually when people allow for spaces in things like tag values, they require them to be enclosed in quotes. By using the negative lookahead, we avoid the issue here, but if you add more keywords or operators, you may have to require this.
See this code on how I define a tag_value expression that will accept multiple words without accidentally matching any of your keywords:
# define keywords to not be accepted as tag words
any_keyword = (pyparsing.Keyword("AND")
| pyparsing.Keyword("OR")
| pyparsing.Keyword("NOT"))
# define a tag word using lookahead to avoid keywords - also need to add "&" symbol
tag_word = ~any_keyword + pyparsing.Word(pyparsing.alphanums+"&")
# define a tag_value that can consist of multiple tag_words
# originalTextFor will return the original text used to match, including any whitespace
tag_value = pyparsing.originalTextFor(tag_word[1, ...])
Then clause
becomes just:
clause = (
pyparsing.oneOf("group tag").setResultsName("field")
+ pyparsing.Literal("=").setResultsName("op")
+ tag_value.setResultsName("value")
).setParseAction(ClauseExpression)
Giving this test output:
tag=Tag C AND tag=E&F
[AND:[field: tag, op: =, value: Tag C, field: tag, op: =, value: E&F]]