Search code examples
pythonlistintersection

Removing a list in a list of lists if a condition not satisfied


I have a list of lists. I want to remove lists if the lenght of intersection of values in the list with previous lists is more than one.

For example

A=[(1,2,3,4), (2,3,5,6),(1,7,8,9),(2,4,6,8)]

We need to keep first list because there is no previous list. We remove (2,3,5,6) as its intersection of previous list is (2,3) and its length is 2. We keep (1,7,8,9) and remove (2,4,6,8) likewise.

At the end our list should become B=[(1,2,3,4), (1,7,8,9)]


Solution

  • A potentially more efficient solution, in case you have (and keep) many more but still short tuples. This checks pairs of numbers in the lists. Keep the current tuple a if none of its pairs occur in an already kept tuple.

    from itertools import combinations
    
    A = [(1,2,3,4), (2,3,5,6), (1,7,8,9), (2,4,6,8)]
    
    B = []
    B_pairs = set()
    for a in A:
        pairs = set(map(frozenset, combinations(a, 2)))
        if not B_pairs & pairs:
            B.append(a)
            B_pairs |= pairs
    print(B)
    

    Some benchmark with Ouss's solution, since they brought that up:

    With 5000 tuples, 2553 of which are kept:
      23 ms  Kelly
    2853 ms  Ouss
    
    With 5000 tuples, 2540 of which are kept:
      21 ms  Kelly
    2776 ms  Ouss
    
    With 5000 tuples, 2558 of which are kept:
      20 ms  Kelly
    2685 ms  Ouss
    

    Benchmark code (Try it online!):

    from itertools import combinations
    
    def Kelly(A):
      B = []
      B_pairs = set()
      for a in A:
        pairs = set(map(frozenset, combinations(a, 2)))
        if not B_pairs & pairs:
            B.append(a)
            B_pairs |= pairs
      return B
    
    def Ouss(A):
      B=[]
    
      # iterate over sublists a of the list A
      for a in A:
        # define and reset the intersections counter
        intersections=0
        # iterate of sublists b of B
        for b in B:
            # reset the intersections counter
            intersections=0
            # iterate over the numbers num of sublist a
            for num in a:
                # perform checks
                if num in b:
                    intersections+=1
                if intersections>1:
                    break
            # break the iteration over sublists b if check fulfilled
            if intersections>1:
                break
        # Append a to the subset B only if intersection were not > 1 
        if intersections>1:
            continue
        B.append(a)
        # repeat
      #B now contains the result...
      return B
    
    from timeit import timeit
    from random import sample
    
    A = [tuple(sample(range(100), 4)) for _ in range(500)]
    print(Kelly(A) == Ouss(A), len(Kelly(A)))
    
    for _ in range(3):
        A = [tuple(sample(range(400), 4)) for _ in range(5000)]
        print(f'With {len(A)} tuples, {len(Kelly(A))} of which are kept:')
        for func in Kelly, Ouss:
            time = timeit(lambda: func(A), number=1)
            print('%4d ms ' % (time * 1e3), func.__name__)
        print()