Search code examples
pythonperformanceoptimizationbinpowerset

Determine subsets among members of a powerset efficiently in Python


So I've indexed the members of the powerset of the alphabet in the following way:

def bytecode_index(some_subset):
    mask = 0
    for index in bytearray(some_subset):
        mask |= (1<<(index-97))
    return mask

This might be a bit nonstandard, and improvements with respect to improving it are very much welcome, but the crux of my question is actually as follows:

How can I take two such masks and determine if one is a subset of the other efficiently and pythonically? One such way for determining if index1 is a subset of index2 is to compare their binary strings. if index1 has a 1 where index2 has a 0, the set corresponding to index1is not a subset of the set corresponding to index2.

I've written something that looks like this for that:

def compare_binary_strings(index1,index2):

    return not any(x == "1" and y == "0" for x,y in zip(bin(index1), bin(index2)))

This seems inefficient, as it involves converting the indices to strings and then comparing them componentwise. Any and all help is much appreciated.

Is there a more simple operation available to quickly compare the two indices?


Solution

  • Well I don't know about Pythonically, but generically the way to check if one bitmask is a subset of the other is:

    (x & y) == x
    

    Iff true, x is a subset of y.

    This just the bitmath equivalent of the familiar

    A ⊆ B ⇔ A ∩ B = A