Let's say I have following example
items = ['milk', 'bread', 'water']
df:
name item1 item2 item3
items_1 milk water
items_2 milk rubber juice
items_3 juice paper wood
items_4 bread
items_5 bread water milk
items_6 milk juice
In this example I would like to get all df rows whose members are completely in items list, which means:
Now, the real "df" dataframe will contain several millions of rows, i.e. items_*, hence the "efficiently" in the title. The number of columns of "df" will be between 10 and 20. Also, there will be several thousands of "items" lists with between 10 and 20 elements.
Can someone please help me out on this one?
Use ~isin to check if the condition is NOT True for all the values, get the index, use boolean indexing. You get
true_names = df[~df.iloc[:, 1:].isin(items)].isnull().all(1)
df.loc[true_names, 'name']
0 name_1
3 name_4
4 name_5