Search code examples
pythonalgorithmset-intersection

Finding n largest intersections of collection of sets in Python


I have a collection (dictionary) of sets of strings, like {1: {'a', 'b'}, ...} I need to find the n largest intersections i.e. intersections of the largest subsets of the collection. The obvious brute-force approach:

for i in range(len(collection),2,-1):
    for subset in combinations(sorted(collection), i):
        intersected = set.intersection(*(collection[k] for k in subset))
        if len(intersected)>0:
            yield len(subset), intersected

is very slow. Is there some efficient way/library to do this?


Solution

  • Just count the number of occurances of each string. The maximum number of occurances is the largest intersection of subsets (assuming a strings are unique in each subset):

    coll = {1:{'a','b'}, 2:{'b','e'}, 3:{'a','c'}, 4:{'b','f'}}
    print(coll)
    
    d=dict()
    for subs in coll.values():
      for s in subs:
        d[s]=d.setdefault(s, 0)+1
    
    m=max(d.values())
    print(m)