I am creating a program which is filtering an excel document. In some Cells, there are combinations of codes separated by Boolean expressions. I need to split these into every possibility, and I am looking at achieving this via PyParsing.
For example, if I have the following:
single = "347SJ"
single_or = "456NG | 347SJ"
and_or = "347SJ & (347SJ | 383DF)"
and_multi_or = "373FU & (383VF | 321AC | 383ZX | 842UQ)"
I want to end up with:
single = "347SJ"
single_or1 = "456NG"
single_or2 = "347SJ"
and_or1 = "347SJ & 347SJ"
and_or2 = "347SJ & 383DF"
and_multi_or1 = "373FU & 383VF"
and_multi_or2 = "373FU & 321AC"
and_multi_or3 = "373FU & 383ZX"
and_multi_or4 = "373FU & 842UQ"
I feel like it's very simple but I can't find anything similar to this online, can anyone help?
Here is a start, a parser that will process those inputs as & and | operations:
import pyparsing as pp
operand = pp.Word(pp.alphanums)
bool_expr = pp.infix_notation(operand, [
('&', 2, pp.opAssoc.LEFT),
('|', 2, pp.opAssoc.LEFT),
])
tests = [single, single_or, and_or, and_multi_or]
bool_expr.run_tests(tests)
prints
347SJ
['347SJ']
456NG | 347SJ
[['456NG', '|', '347SJ']]
[0]:
['456NG', '|', '347SJ']
347SJ & (347SJ | 383DF)
[['347SJ', '&', ['347SJ', '|', '383DF']]]
[0]:
['347SJ', '&', ['347SJ', '|', '383DF']]
[2]:
['347SJ', '|', '383DF']
373FU & (383VF | 321AC | 383ZX | 842UQ)
[['373FU', '&', ['383VF', '|', '321AC', '|', '383ZX', '|', '842UQ']]]
[0]:
['373FU', '&', ['383VF', '|', '321AC', '|', '383ZX', '|', '842UQ']]
[2]:
['383VF', '|', '321AC', '|', '383ZX', '|', '842UQ']
So that was the simple part. The next step is to traverse that structure and generate the various paths through the logical operators. This part gets kind of hairy.
PART 2 -- STOP HERE IF YOU WANT TO SOLVE THE REST YOURSELF!!!
(Much code taken from the invRegex example, which inverts a regex by generating sample strings that match it.)
Rather than extract the structure and then walk it, we can use pyparsing parse actions to add active nodes at each level that will then give us generators to generate each bit of the expression.
class GroupEmitter:
def __init__(self, exprs):
# drop the operators, take the operands
self.exprs = exprs[0][::2]
def makeGenerator(self):
def groupGen():
def recurseList(elist):
if len(elist) == 1:
if isinstance(elist[0], pp.ParseResults):
yield from recurseList(elist[0])
else:
yield from elist[0].makeGenerator()()
else:
for s in elist[0].makeGenerator()():
for s2 in recurseList(elist[1:]):
# this could certainly be improved
if isinstance(s, str):
if isinstance(s2, str):
yield [s, s2]
else:
yield [s, *s2]
else:
if isinstance(s2, str):
yield [*s, s2]
else:
yield [*s, *s2]
if self.exprs:
yield from recurseList(self.exprs)
return groupGen
class AlternativeEmitter:
def __init__(self, exprs):
# drop the operators, take the operands
self.exprs = exprs[0][::2]
def makeGenerator(self):
def altGen():
for e in self.exprs:
yield from e.makeGenerator()()
return altGen
class LiteralEmitter:
def __init__(self, lit):
self.lit = lit[0]
def __str__(self):
return "Lit:" + self.lit
def __repr__(self):
return "Lit:" + self.lit
def makeGenerator(self):
def litGen():
yield [self.lit]
return litGen
operand.add_parse_action(LiteralEmitter)
bool_expr = pp.infix_notation(operand, [
('&', 2, pp.opAssoc.LEFT, GroupEmitter),
('|', 2, pp.opAssoc.LEFT, AlternativeEmitter),
])
def invert(expr):
return bool_expr.parse_string(expr)[0].makeGenerator()()
for t in tests:
print(t)
print(list(invert(t)))
print("\n".join(' & '.join(tt) for tt in invert(t)))
print()
prints
347SJ
[['347SJ']]
347SJ
456NG | 347SJ
[['456NG'], ['347SJ']]
456NG
347SJ
347SJ & (347SJ | 383DF)
[['347SJ', '347SJ'], ['347SJ', '383DF']]
347SJ & 347SJ
347SJ & 383DF
373FU & (383VF | 321AC | 383ZX | 842UQ)
[['373FU', '383VF'], ['373FU', '321AC'], ['373FU', '383ZX'], ['373FU', '842UQ']]
373FU & 383VF
373FU & 321AC
373FU & 383ZX
373FU & 842UQ