Search code examples
pythonperformancelisttuplesmatching

Python: Speed up matching in multiple lists


I have four lists like these:

L = [ (1,2), (3,5), (6,10), (7,8) ]
M = [ (1,3), (8,9), (12,13) ]
N = [ (6,10), (3,4), (5,6), (10,11), (12,13) ]
T = [ (6,10) , (1,4) ]

I want to check the presence/absence of every tuple of T inside L, M, and N:

[[True, False, True], [False, False, False]]

The following works but it is incredibly inefficient when the size of T, L, M, and N grows.

[[ y in x for x in [L, M, N] ] for y in T ]

What is the most efficient way to speed up this for large lists?


Solution

  • Searching in list time is proportional to the list length. So it's high for long lists. There are special data structures optimized for searching. The easiest in python is set. It calculates the hash of each element (so the elements must be hashable, and tuples of integers are OK). Then you do the same check. So you just need to add

    L = set(L)
    M = set(M)
    N = set(N)
    

    As a side effect you will lose the order of elements in the list. And if there are non unique values they will be merged into one.

    Moreover sets have method for finding common elements. L & M will give you elements which exists in L and in M. That would be the fastest solution.

    About speed of search an item in list and set: Speed of search in list might be high if the searched values are at the very beginning (or the list is short). But if it is not the case set is much-much faster as time of search is proportional to log(len(data)). The worst case for list is when there are no searched items in the list, so it will need to check every item. In this case searching in 1M list would be 200K slower, than in set (just checked in python3)