Search code examples
pythonpython-3.xoptimizationbigdatamasking

Creating a long masking list (Python)


Here is what I have:

long_list = a very long list of integer values (6M+ entries)

wanted_list = a list of integer values that are of interest (70K entries)


What I need:

mask_list = a list of booleans of the same length as long_list, describing whether each element of long_list in present in the wanted_list (i.e. [is long_list[0] in wanted_list?, is long_list[1] in wanted_list?,....]). The number of'True' entries in this list should be the same as len(wanted_list)


I got a working code using a for loop, but as expected it's way too slow for the length of the lists that I am working with (takes several minutes to run):

masklist = []

for element in long_list:
    if element in wanted_list:
        masklist.append(True)
    else: 
        masklist.append(False)

I was wondering if there is a more elegant and fast way to achieve this goal? I was looking into numpy.ma module, but could not think of an elegant way to apply it to this problem


Solution

  • You can use numpy.isin for this:

    masklist = np.isin(long_list, wanted_list)