Search code examples
pythonpython-itertools

Explore all possible BOOLEAN combinations of variables


I have a list: [A,B,C] and I want to explore all possible logic combinations. I have got as far as generating:

[A,AND,B,AND,C]
[A,AND,B,OR,C]
[A,OR,B,AND,C]
[A,OR,B,OR,C]
[A,AND,C,OR,B]

This was partially achieved using the following:

def get_macro_comb(keys):
    all_comb = []
    for combination in itertools.product(BOOL, repeat=get_needed_bool(keys)):
        all_comb.append(combination)
    expanded_list = list(set(all_comb))
    
    all_aud = []
    for boo in expanded_list:
        aud = [x for x in chain(*zip_longest(keys, boo)) if x is not None]
        all_aud.append(aud)
    return all_aud


def get_needed_bool(l):
    needed_bool = len(l) -1
    return needed_bool
    
BOOL = ['AND','OR']
full_aud = []
for comb in get_macro_comb(['A','B','C']):
    aud = ' '.join(comb)
    full_aud.append(comb)

and I want to group this for a query, so taking the second example I want:

(A AND B) OR C
A AND (B OR C)

and I want this generalised to n items in a list.

I have only managed to think of a bullish approach:

x =['A','AND','B','OR','C']
aud = '({})'.format(' '.join(x[i-1:i+2]))
rest_aud = ' '.join(x[i+2:])
aud = aud + ' ' + rest_aud
print(aud)

aud = '({})'.format(' '.join(x[i+1:]))
rest_aud = ' '.join(x[i-1:i+1])
aud =  rest_aud + ' ' + aud
print(aud)

which returns:

(A AND B) OR C
A AND (B OR C)

clearly non-generalisable


Solution

  • If you include permutations of the symbols (e.g. "A OR B AND C" ... "A AND C OR B"), implement operator precedence (i.e. AND before OR) and want to exclude formulation that produce the same True/False result for every input pattern, the logic can get quite complex:

    Here's my take on it:

    • I first view the operations as a binary tree which recursively combines a left and right side either with an OR or an AND operator.
    • To implement permutations, the left side is formed of the powerset of combinations that use half of the values or less (with the right side using the remaining values
    • I only use up to half of the values on the left side because if I used more than half, then the right side would contain combinations that have already been seen on the left, and since OR and AND are commutative, those additional distributions would only produce expressions that are equivalent
    • When the number of symbols is even and both sides are equal sized, avoiding the mirroring distributions is handled using a set where the leftside combinations are stored and excluded from being used on the right side
    • Still this will produce equivalent expressions that need to be excluded from the result
    • Because of the recursive nature of the approach, I use a cache so that combinations of symbols are only computed once

    This function tests if two expressions are equivalent. It does so by feeding each expression with all combinations of True/False patterns and evaluating them to compare results. If the two expressions produce the same result for all patterns, then they are equivalent.

    def isEquivalent(A,B):
        if A.count("and") != B.count("and") \
        or A.count("or")  != B.count("or"):
            return False
        symbols = A
        for c in ("(",")"," or"," and"):
            symbols = symbols.replace(c,"")
        symbols = symbols.split()
        different = compile(f"({A}) != ({B})","<string>","eval")
        for n in range(1<<len(symbols)):
            pattern = dict(zip(symbols,map(int,f"{n:0{len(symbols)}b}")))
            if eval(different,pattern):
                return False
        return True
    

    Caching of expression is based on the number of symbols. The cache contains the list of expressions with generic placeholders for the symbols which are mapped to the actual keys using string formatting. For example the result for keys XYZ are the same as those of ABC except that X, Y an Z take the places of the original A, B and C. The cache actually contains expressions like "{2} and {1} or {0}".

    cache = dict()
    def cacheResult(keys,result=None):
        if not result:
            return [ r.format(*keys) for r in cache.get(len(keys),[]) ]   
        cache[len(keys)] = resFormat = []
        for r in result:
            r = r.replace("and","&").replace("or","|")
            for i,k in enumerate(keys):
                r = r.replace(k,"{"+str(i)+"}")
            r = r.replace("&","and").replace("|","or")
            resFormat.append(r)
        return result
    

    This is the recursive function that generates the expressions. It adds parentheses as appropriate when ANDing sub-expressions that contain a top level OR operation.

    from itertools import combinations
       
    def boolCombo(keys):
        if len(keys)==1: return list(keys)
        result = cacheResult(keys)
        if result: return result
        def addResult(X): 
            if not any(isEquivalent(X,r) for r in result):
                result.append(X)
                
        seenLeft  = set()
        for leftSize in range(1,len(keys)//2+1):
            for leftIdx in combinations(range(len(keys)),leftSize):
                rightKeys = list(keys)
                leftKeys  = tuple(rightKeys.pop(i) for i in reversed(leftIdx))
                rightKeys = tuple(rightKeys)
                if len(leftKeys)==len(rightKeys):
                    if rightKeys in seenLeft: continue
                    seenLeft.add(leftKeys)
                for left in boolCombo(leftKeys):
                    pleft = left
                    op = (p.count("(")==p.count(")") for p in left.split(" or "))
                    if " or " in left and any(op):
                        pleft = f"({left})"
                    for right in boolCombo(rightKeys):
                        addResult(f"{left} or {right}")
                        op = (p.count("(")==p.count(")")
                              for p in right.split(" or "))
                        if " or " in right and any(op):
                            right = f"({right})"
                        addResult(f"{pleft} and {right}")
        return cacheResult(keys,result)    
    

    output:

    symbols = "ABC"            
    
    for i,combo in enumerate(boolCombo(symbols),1):
        print(i,combo)   
    
    1 A or B or C
    2 A and (B or C)
    3 A or B and C
    4 A and B and C
    5 B and (A or C)
    6 B or A and C
    7 C and (A or B)
    8 C or A and B
    
    len(boolCombo("ABCD"))   # 52
    len(boolCombo("ABCDE"))  # 472
    len(boolCombo("ABCDEF")) # 5504 takes 10 minutes to finish
    

    [EDIT] Improved version

    While fiddling to improve performance, I realized that the cases where IsEquivalent would find matches were all due to operand ordering. And, since checking for equivalent expressions takes most of the processing time, I made sure to generate expressions with a constant ordering. This allowed me to do away with isEquivalent because equivalent expressions would now produce the exact same string:

    The cacheResult function is the same but I added a sort on the results to make the output easier to read:

    cache = dict()
    def cacheResult(keys,result=None):
        if not result:
            return [ r.format(*keys) for r in cache.get(len(keys),[]) ]   
        cache[len(keys)] = resFormat = []
        result = sorted(result,key=lambda x:x.replace("  "," "))
        for r in result:
            r = r.replace("and","&").replace("or","|")
            for i,k in enumerate(keys):
                r = r.replace(k,"{"+str(i)+"}")
            r = r.replace("&","and").replace("|","or")
            resFormat.append(r)
        return result
    

    The boolCombo function now distinguishes between top level AND and OR so that the operands can be sorted in ascending order when the expressions is formed of multiple ORs or ANDs at the top precedence level. For example B and C and A would be reordered to A and B and C which can now be identified as duplicates simply on the basis of the string representation being the same. Using a set instead of a list for result automatically eliminates duplicates.

    from itertools import combinations,product
    or0,or1,and0,and1 = "  or  "," or ","  and  "," and "   
    def boolCombo(keys):
        if len(keys)==1: return list(keys)
        result = cacheResult(keys) or set()
        if result: return result
        def addResult(left,right):
            OR = or0.join(sorted(left.split(or0)+right.split(or0)))
            result.add(OR.replace(and0,and1))
            if or0 in left:  left  = f"({left})"
            if or0 in right: right = f"({right})"
            AND = and0.join(sorted(left.split(and0)+right.split(and0)))
            result.add(AND.replace(or0,or1))
                
        seenLeft  = set()
        for leftSize in range(1,len(keys)//2+1):
            for leftKeys in combinations(keys,leftSize):
                rightKeys = [k for k in keys if k not in leftKeys]
                if len(leftKeys)==len(rightKeys):
                    if tuple(rightKeys) in seenLeft: continue
                    seenLeft.add(tuple(leftKeys))
                for left,right in product(*map(boolCombo,(leftKeys,rightKeys))):
                    addResult(left,right)
        return cacheResult(keys,result)
    

    output:

    symbols = "ABC"            
    
    for i,combo in enumerate(boolCombo(symbols),1):
        print(i,combo.replace("  "," "))   
    
    1 (A or B) and C
    2 (A or C) and B
    3 (B or C) and A
    4 A and B and C
    5 A and B or C
    6 A and C or B
    7 A or B and C
    8 A or B or C
    
    len(boolCombo("ABCD"))      # 52
    len(boolCombo("ABCDE"))     # 472
    len(boolCombo("ABCDEF"))    # 5504
    len(boolCombo("ABCDEFG"))   # 78416
    len(boolCombo("ABCDEFGH"))  # 1320064  (12 seconds)
    len(boolCombo("ABCDEFGHI")) # 25637824 (4.8 minutes)