Search code examples
pythonregexpowerset

Specialized Powerset Requirement


Let's say I have compiled five regular expression patterns and then created five Boolean variables:

a =  re.search(first, mystr)
b =  re.search(second, mystr)
c =  re.search(third, mystr)
d = re.search(fourth, mystr)
e = re.search(fifth, mystr)

I want to use the Powerset of (a, b, c, d, e) in a function so it finds more specific matches first then falls through. As you can see, the Powerset (well, its list representation) should be sorted by # of elements descending.

Desired behavior:

 if a and b and c and d and e:
     return 'abcde' 
 if a and b and c and d:
     return 'abcd'
 [... and all the other 4-matches ]
 [now the three-matches]
 [now the two-matches]
 [now the single matches]
 return 'No Match'  # did not match anything

Is there a way to utilize the Powerset programmatically and ideally, tersely, to get this function's behavior?


Solution

  • You could use the powerset() generator function recipe in the itertools documentation like this:

    from itertools import chain, combinations
    from pprint import pprint
    import re
    
    def powerset(iterable):
        "powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
        s = list(iterable)
        return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))
    
    mystr   = "abcdefghijklmnopqrstuvwxyz"
    first   = "a"
    second  = "B"  # won't match, should be omitted from result
    third   = "c"
    fourth  = "d"
    fifth   = "e"
    
    a = 'a' if re.search(first, mystr) else ''
    b = 'b' if re.search(second, mystr) else ''
    c = 'c' if re.search(third, mystr) else ''
    d = 'd' if re.search(fourth, mystr) else ''
    e = 'e' if re.search(fifth, mystr) else ''
    
    elements = (elem for elem in [a, b, c, d, e] if elem is not '')
    spec_ps = [''.join(item for item in group)
                  for group in sorted(powerset(elements), key=len, reverse=True)
                      if any(item for item in group)]
    
    pprint(spec_ps)
    

    Output:

    ['acde',
     'acd',
     'ace',
     'ade',
     'cde',
     'ac',
     'ad',
     'ae',
     'cd',
     'ce',
     'de',
     'a',
     'c',
     'd',
     'e']