I have a list of lists. I want to remove lists if the lenght of intersection of values in the list with previous lists is more than one.
For example
A=[(1,2,3,4), (2,3,5,6),(1,7,8,9),(2,4,6,8)]
We need to keep first list because there is no previous list. We remove (2,3,5,6)
as its intersection of previous list is (2,3)
and its length is 2
. We keep (1,7,8,9)
and remove (2,4,6,8)
likewise.
At the end our list should become B=[(1,2,3,4), (1,7,8,9)]
A potentially more efficient solution, in case you have (and keep) many more but still short tuples. This checks pairs of numbers in the lists. Keep the current tuple a
if none of its pairs occur in an already kept tuple.
from itertools import combinations
A = [(1,2,3,4), (2,3,5,6), (1,7,8,9), (2,4,6,8)]
B = []
B_pairs = set()
for a in A:
pairs = set(map(frozenset, combinations(a, 2)))
if not B_pairs & pairs:
B.append(a)
B_pairs |= pairs
print(B)
Some benchmark with Ouss's solution, since they brought that up:
With 5000 tuples, 2553 of which are kept:
23 ms Kelly
2853 ms Ouss
With 5000 tuples, 2540 of which are kept:
21 ms Kelly
2776 ms Ouss
With 5000 tuples, 2558 of which are kept:
20 ms Kelly
2685 ms Ouss
Benchmark code (Try it online!):
from itertools import combinations
def Kelly(A):
B = []
B_pairs = set()
for a in A:
pairs = set(map(frozenset, combinations(a, 2)))
if not B_pairs & pairs:
B.append(a)
B_pairs |= pairs
return B
def Ouss(A):
B=[]
# iterate over sublists a of the list A
for a in A:
# define and reset the intersections counter
intersections=0
# iterate of sublists b of B
for b in B:
# reset the intersections counter
intersections=0
# iterate over the numbers num of sublist a
for num in a:
# perform checks
if num in b:
intersections+=1
if intersections>1:
break
# break the iteration over sublists b if check fulfilled
if intersections>1:
break
# Append a to the subset B only if intersection were not > 1
if intersections>1:
continue
B.append(a)
# repeat
#B now contains the result...
return B
from timeit import timeit
from random import sample
A = [tuple(sample(range(100), 4)) for _ in range(500)]
print(Kelly(A) == Ouss(A), len(Kelly(A)))
for _ in range(3):
A = [tuple(sample(range(400), 4)) for _ in range(5000)]
print(f'With {len(A)} tuples, {len(Kelly(A))} of which are kept:')
for func in Kelly, Ouss:
time = timeit(lambda: func(A), number=1)
print('%4d ms ' % (time * 1e3), func.__name__)
print()