I have a collection (dictionary) of sets of strings, like {1: {'a', 'b'}, ...}
I need to find the n largest intersections i.e. intersections of the largest subsets of the collection. The obvious brute-force approach:
for i in range(len(collection),2,-1):
for subset in combinations(sorted(collection), i):
intersected = set.intersection(*(collection[k] for k in subset))
if len(intersected)>0:
yield len(subset), intersected
is very slow. Is there some efficient way/library to do this?
Just count the number of occurances of each string. The maximum number of occurances is the largest intersection of subsets (assuming a strings are unique in each subset):
coll = {1:{'a','b'}, 2:{'b','e'}, 3:{'a','c'}, 4:{'b','f'}}
print(coll)
d=dict()
for subs in coll.values():
for s in subs:
d[s]=d.setdefault(s, 0)+1
m=max(d.values())
print(m)