python string dataframe contains partial

Python - keep rows in dataframe based on partial string match

I have 2 dataframes :
df1 is a list of mailboxes and email ids
df2 shows a list of approved domains

I read both the dataframes from an excel sheet

    xls = pd.ExcelFile(input_file_shared_mailbox)
    df = pd.read_excel(xls, sheet_name = sheet_name_shared_mailbox)

i want to only keep records in df1 where df1[Email_Id] contains df2[approved_domain]

    print(df1)  
        Mailbox Email_Id  
    0   mailbox1   [email protected]  
    1   mailbox2   [email protected]  
    2   mailbox3   [email protected]  

    print(df2)  
        approved_domain  
    0   msn.com  
    1   gmail.com

and i want df3 which basically shows

    print (df3)  
        Mailbox Email_Id  
    0   mailbox1   [email protected]  
    1   mailbox3   [email protected]

this is the code i have right now which i think is close but i can't figure out the exact problem in the syntax

df3 = df1[df1['Email_Id'].apply(lambda x: [item for item in x if item in df2['Approved_Domains'].tolist()])]

But get this error

TypeError: unhashable type: 'list'

i spent a lot of time researching the forum for a solution but could not find what i was looking for. appreciate all the help.

Solution

So these are the steps you will need to follow to do what you want done for your two data frames

1.Split your email_address column into two separate columns

     df1['add'], df1['domain'] = df1['email_address'].str.split('@', 1).str

2.Then drop your add column to keep your data frame clean

      df1 = df1.drop('add',axis =1)

3.Get a new Data Frame with only values you want by not selecting any value in the 'domain' column that doesn't match 'approved_doman' column

      df_new = df1[~df1['domain'].isin(df2['approved_domain'])]

4. Drop the 'domain' column in df_new

      df_new = df_new.drop('domain',axis = 1)

This is what the result will be

    mailbox     email_address
1   mailbox2    [email protected]
2   mailbox3    [email protected]