Search code examples
pythonlistfor-loopvectorization

How do I efficiently find which elements of a list are in another list?


I want to know which elements of list_1 are in list_2. I need the output as an ordered list of booleans. But I want to avoid for loops, because both lists have over 2 million elements.

This is what I have and it works, but it's too slow:

list_1 = [0,0,1,2,0,0]
list_2 = [1,2,3,4,5,6]

booleans = []
for i in list_1:
   booleans.append(i in list_2)

# booleans = [False, False, True, True, False, False]

I could split the list and use multithreading, but I would prefer a simpler solution if possible. I know some functions like sum() use vector operations. I am looking for something similar.

How can I make my code more efficient?


Solution

  • If you want to use a vector approach you can also use Numpy isin. It's not the fastest method, as demonstrated by oda's excellent post, but it's definitely an alternative to consider.

    import numpy as np
    
    list_1 = [0,0,1,2,0,0]
    list_2 = [1,2,3,4,5,6]
    
    a1 = np.array(list_1)
    a2 = np.array(list_2)
    
    np.isin(a1, a2)
    # array([False, False,  True,  True, False, False])