Search code examples
pythonpython-3.xlistdictionaryhashtable

Most efficient way to check if a group is a subgroup of other group using any data structure on Python 3


If I can choose between list of objects, lists of integers and dictionaries (is there any other data structure option?), which is the most efficient data structure to discover if a group is a subgroup of other group, and how to implement?

I have a class with unique integers atributes, so I can "represent" the class object's by the integer they contain, or the object (pointer) itself.

a.atributeEx = 1
b.atributeEx = 2
c.atributeEx = 3

listOfPointers = [a,b,c]
listAtributes = [1,2,3]
dictEx = ['1':1, '2':2, '3':3]

One option is to use issubset, as bellow:

listAtributes2 = [1,2]
set(listAtributes).issubset(listAtributes2)

However, using the issubset function with a list of atributes, my code would take months to years to finish to run, as this has to be performed billions of times. Usually, one list has from 1 to 4 elements while the other has from 200 to 2000 elements.

What is the best approach for this problem?


Solution

  • Using set methods should be fairly efficient. In particular, if you do x.issubset(y), only the size of x matters since a set uses hashing to check if it contains a given item.

    Although, if you are instantiating a set for every comparison, then you are adding a big overhead to your computations.

    One solution is to instantiate a new set in your __init__ method only and to populate it in your __setattr__ method. To allow attribute deletion as well, you can define a __delattr__ method that removes elements from the set. Finally, you can use the __contains__ method to execute the subset comparison when using the keyword in.

    class Container:
        def __init__(self, **kwargs):
            self._attr_set = set()
            for k, v in kwargs.items():
                setattr(self, k, v)
    
        def __setattr__(self, key, value):
            self._attr_set.add((key, value))
            super().__setattr__(key, value)
    
        def __delattr__(self, item):
            for x in self._attr_set:
                if x[0] == item:
                    self._attr_set.remove(x)
                    break
            super().__delattr__(item)
    
    
        def __contains__(self, item):
            return item._attr_set.issubset(self._attr_set)
    

    Example

    x = Container(a=1, b=2, c=3)
    y = Container(a=1, b=2)
    
    print(y in x) # True