Search code examples
pythonboolean-logicpyparsing

Split a Boolean expression into all possibilities with PyParsing


I am creating a program which is filtering an excel document. In some Cells, there are combinations of codes separated by Boolean expressions. I need to split these into every possibility, and I am looking at achieving this via PyParsing.

For example, if I have the following:

single = "347SJ"
single_or = "456NG | 347SJ"
and_or = "347SJ & (347SJ | 383DF)"
and_multi_or = "373FU & (383VF | 321AC | 383ZX | 842UQ)"

I want to end up with:

single = "347SJ"
single_or1 = "456NG"
single_or2 = "347SJ"
and_or1 = "347SJ & 347SJ"
and_or2 = "347SJ & 383DF"
and_multi_or1 = "373FU & 383VF"
and_multi_or2 = "373FU & 321AC"
and_multi_or3 = "373FU & 383ZX"
and_multi_or4 = "373FU & 842UQ"

I feel like it's very simple but I can't find anything similar to this online, can anyone help?


Solution

  • Here is a start, a parser that will process those inputs as & and | operations:

    import pyparsing as pp
    
    operand = pp.Word(pp.alphanums)
    
    bool_expr = pp.infix_notation(operand, [
        ('&', 2, pp.opAssoc.LEFT),
        ('|', 2, pp.opAssoc.LEFT),
    ])
    
    tests = [single, single_or, and_or, and_multi_or]
    
    bool_expr.run_tests(tests)
    

    prints

    347SJ    
    ['347SJ']
    
    456NG | 347SJ
    [['456NG', '|', '347SJ']]
    [0]:
      ['456NG', '|', '347SJ']
    
    347SJ & (347SJ | 383DF)
    [['347SJ', '&', ['347SJ', '|', '383DF']]]
    [0]:
      ['347SJ', '&', ['347SJ', '|', '383DF']]
      [2]:
        ['347SJ', '|', '383DF']
    
    373FU & (383VF | 321AC | 383ZX | 842UQ)
    [['373FU', '&', ['383VF', '|', '321AC', '|', '383ZX', '|', '842UQ']]]
    [0]:
      ['373FU', '&', ['383VF', '|', '321AC', '|', '383ZX', '|', '842UQ']]
      [2]:
        ['383VF', '|', '321AC', '|', '383ZX', '|', '842UQ']
    

    So that was the simple part. The next step is to traverse that structure and generate the various paths through the logical operators. This part gets kind of hairy.

    PART 2 -- STOP HERE IF YOU WANT TO SOLVE THE REST YOURSELF!!!

    (Much code taken from the invRegex example, which inverts a regex by generating sample strings that match it.)

    Rather than extract the structure and then walk it, we can use pyparsing parse actions to add active nodes at each level that will then give us generators to generate each bit of the expression.

    class GroupEmitter:
        def __init__(self, exprs):
            # drop the operators, take the operands
            self.exprs = exprs[0][::2]
    
        def makeGenerator(self):
            def groupGen():
                def recurseList(elist):
                    if len(elist) == 1:
                        if isinstance(elist[0], pp.ParseResults):
                            yield from recurseList(elist[0])
                        else:
                            yield from elist[0].makeGenerator()()
                    else:
                        for s in elist[0].makeGenerator()():
                            for s2 in recurseList(elist[1:]):
                                # this could certainly be improved
                                if isinstance(s, str):
                                    if isinstance(s2, str):
                                        yield [s, s2]
                                    else:
                                        yield [s, *s2]
                                else:
                                    if isinstance(s2, str):
                                        yield [*s, s2]
                                    else:
                                        yield [*s, *s2]
    
                if self.exprs:
                    yield from recurseList(self.exprs)
    
            return groupGen
    
    
    class AlternativeEmitter:
        def __init__(self, exprs):
            # drop the operators, take the operands
            self.exprs = exprs[0][::2]
    
        def makeGenerator(self):
            def altGen():
                for e in self.exprs:
                    yield from e.makeGenerator()()
    
            return altGen
    
    
    class LiteralEmitter:
        def __init__(self, lit):
            self.lit = lit[0]
    
        def __str__(self):
            return "Lit:" + self.lit
    
        def __repr__(self):
            return "Lit:" + self.lit
    
        def makeGenerator(self):
            def litGen():
                yield [self.lit]
    
            return litGen
    
    
    operand.add_parse_action(LiteralEmitter)
    bool_expr = pp.infix_notation(operand, [
        ('&', 2, pp.opAssoc.LEFT, GroupEmitter),
        ('|', 2, pp.opAssoc.LEFT, AlternativeEmitter),
    ])
    
    
    def invert(expr):
        return bool_expr.parse_string(expr)[0].makeGenerator()()
    
    
    for t in tests:
        print(t)
        print(list(invert(t)))
        print("\n".join(' & '.join(tt) for tt in invert(t)))
        print()
    

    prints

    347SJ
    [['347SJ']]
    347SJ
    
    456NG | 347SJ
    [['456NG'], ['347SJ']]
    456NG
    347SJ
    
    347SJ & (347SJ | 383DF)
    [['347SJ', '347SJ'], ['347SJ', '383DF']]
    347SJ & 347SJ
    347SJ & 383DF
    
    373FU & (383VF | 321AC | 383ZX | 842UQ)
    [['373FU', '383VF'], ['373FU', '321AC'], ['373FU', '383ZX'], ['373FU', '842UQ']]
    373FU & 383VF
    373FU & 321AC
    373FU & 383ZX
    373FU & 842UQ