I have a dataset which exist of a column with frozenset combinations.
Data
import pandas as pd
import numpy as np
d = {'ID1': [frozenset(['a', 'b']), frozenset(['a','c']), frozenset(['c','d'])]}
df = pd.DataFrame(data=d)
Furthermore, I have a list with letters, and now I would like to have a list with the index of the rows in the dataset where an item from the list appeared. So assume the following list:
lst = ['a', 'b']
indexSaver = []
I can work around this with a for loop, however the dataset exist of over 27 millions so I'm quite sure that it would save me some time to solve this issue.
for i in range(len(df)):
for item in df['ID1'].iloc[i]:
if item in lst:
indexSaver.append(i)
Desired output: In this case item a and item b appeared in row 0 (twice) and in row 1. The desired output here than would be [0, 0, 1], having said, with a output [0,1] I could work as well.
Anyone a more elegant idea?
I assume you meant the desired output is [1,1,0] but you can reverse the logic if needed
df['indexSaver']=df['ID1'].apply(lambda f: 1 if len(f.intersection(['a','b']))>0 else 0)
if you strictly need it as a list
indexSaver=list(df['indexSaver'])