What would be the best way to programmatically translate a string like
"((abc&(def|ghi))|jkl)&mno"
to be executed as as:
if ((func('abc') and (func('def') or func('ghi'))) or func('jkl')) and func('mno'):
return True
I feel like there must be a simple way to achieve this, but I can't get my head around it.
This is an interesting little problem, with a number of layers to a solution.
First off, given this sample, you need a basic infix notation parser. In pyparsing, there is a builtin helper method infixNotation
. Several pyparsing examples show how to parse a boolean expression using infixNotation
. Here is a parser that will parse your sample expression:
import pyparsing as pp
term = pp.Word(pp.alphas)
AND = pp.Literal("&")
OR = pp.Literal("|")
expr = pp.infixNotation(term,
[
(AND, 2, pp.opAssoc.LEFT,),
(OR, 2, pp.opAssoc.LEFT,),
])
print(expr.parseString(sample).asList())
For your sample, this will print:
[[[['abc', '&', ['def', '|', 'ghi']], '|', 'jkl'], '&', 'mno']]
You can see that we have captured not only the expression, but also the grouping by parentheses.
We can start to do the conversion to your desired output by adding parse actions. These are parse-time callbacks that pyparsing will call, to replace the parsed tokens with a different value (which need not be a string, could be an AST node for evaluation - but in this case we will return a modified string).
AND.addParseAction(lambda: " and ")
OR.addParseAction(lambda: " or ")
term.addParseAction(lambda t: "func('{}')".format(t[0]))
expr.addParseAction(lambda t: "({})".format(''.join(t[0])))
Parse actions can be methods with various signatures:
function()
function(tokens)
function(location, tokens)
function(input_string, location, tokens)
For AND and OR, we only need to replace the parsed operators with their corresponding "and" and "or" keywords. For the parsed variable terms, we want to change "xxx" to "func(xxx)", so we write a parse action that takes the parsed tokens, and returns modified string.
The parse action for expr
is interesting because all it does is take the parsed contents, join them using ''.join()
, and then wrap that in ()
s. Since expr
is actually a recursive expression, we will see that it does the proper wrapping in ()'s at each level in the parsed nested list.
After adding these parse actions, we can try calling parseString()
again, now giving:
["(((func('abc') and (func('def') or func('ghi'))) or func('jkl')) and func('mno'))"]
Getting close!
To do the formatting into your desired if
statement, we can use another parse action. But we can't attach this parse action directly to expr
, since we saw that expr
(and its associated parse action) will get parsed at all levels of nesting. So instead, we can create an "outer" version of expr, that is simply a container expression of an expr:
outer_expr = pp.Group(expr)
The parse action is similar to what we saw for expr
, where we return a new string using the input tokens:
def format_expression(tokens):
return "if {}:\n return True".format(''.join(tokens[0]))
outer_expr.addParseAction(format_expression)
Now we use outer_expr
to parse the input string:
print(outer_expr.parseString(sample)[0])
Getting:
if (((func('abc') and (func('def') or func('ghi'))) or func('jkl')) and func('mno')):
return True
(There might be an extra set of ()'s on this value, they could be removed in the parse action for outer_expr
if desired.)
Finished version of the parser (uncomment the intermediate print statements to see the progression of the parser functionality):
sample = "((abc&(def|ghi))|jkl)&mno"
import pyparsing as pp
term = pp.Word(pp.alphas)
AND = pp.Literal("&")
OR = pp.Literal("|")
expr = pp.infixNotation(term,
[
(AND, 2, pp.opAssoc.LEFT,),
(OR, 2, pp.opAssoc.LEFT,),
])
# print(expr.parseString(sample).asList())
AND.addParseAction(lambda: " and ")
OR.addParseAction(lambda: " or ")
term.addParseAction(lambda t: "func('{}')".format(t[0]))
expr.addParseAction(lambda t: "({})".format(''.join(t[0])))
# print(expr.parseString(sample).asList())
def format_expression(tokens):
return "if {}:\n return True".format(''.join(tokens[0]))
outer_expr = pp.Group(expr).addParseAction(format_expression)
print(outer_expr.parseString(sample)[0])