Search code examples
pythonpandaslistfrozenset

Check if item of frozenset in list


I have a dataset which exist of a column with frozenset combinations.

Data

import pandas as pd
import numpy as np
d = {'ID1': [frozenset(['a', 'b']), frozenset(['a','c']), frozenset(['c','d'])]}
df = pd.DataFrame(data=d)

Furthermore, I have a list with letters, and now I would like to have a list with the index of the rows in the dataset where an item from the list appeared. So assume the following list:

lst = ['a', 'b']
indexSaver = []

I can work around this with a for loop, however the dataset exist of over 27 millions so I'm quite sure that it would save me some time to solve this issue.

for i in range(len(df)):
    for item in df['ID1'].iloc[i]:
        if item in lst:
            indexSaver.append(i)

Desired output: In this case item a and item b appeared in row 0 (twice) and in row 1. The desired output here than would be [0, 0, 1], having said, with a output [0,1] I could work as well.

Anyone a more elegant idea?


Solution

  • I assume you meant the desired output is [1,1,0] but you can reverse the logic if needed

     df['indexSaver']=df['ID1'].apply(lambda f: 1 if  len(f.intersection(['a','b']))>0 else 0)
    

    if you strictly need it as a list

    indexSaver=list(df['indexSaver'])