I have a dataframe df1
like below -
|email_id| date |
|abc@gmail.com | ['2022-04-09'] |
|ABC@gmail.com | [nan]
|def@gmail.com | ['2022-09-21','2022-03-09'] |
|ghi@gmail.com | [nan, '2022-03-29'] |
|jkl@gmail.com | [nan] |
|mnp@gmail.com | [nan,'2022-09-01']
Another df df2
-
|email_id| status |
|abc@gmail.com | 0 |
|def@gmail.com | 0 |
|ghi@gmail.com | 0 |
|mno@gmail.com | 3 |
|pqr@gmail.com | 2 |
|MNP@gmail.com | 1 |
How can I lookup email_id from df2 in df1 and update the status in df2? If we have the date values present in df1 date column , status for that email_id should be 0, and if we have any nan values present, the status should be 1. If some email_id from df2 doesn't match in df1 , will keep the status as same.
Expected output of df2 -
|email_id| status |
|abc@gmail.com | 1 |
|def@gmail.com | 0 |
|ghi@gmail.com | 1 |
|mno@gmail.com | 3 |
|pqr@gmail.com | 2 |
|MNP@gmail.com | 1 |
Please help me out. Thanks in advance!
First use DataFrame.explode
for column from lists, then create compare for missing values with aggregate max
for mapping Series, use Series.map
with replace non matched values to original column df2['status']
:
df = df1.explode('date')
s = df['date'].isna().astype(int).groupby(df['email_id'].str.lower()).max()
print (s)
email_id
abc@gmail.com 1
def@gmail.com 0
ghi@gmail.com 1
jkl@gmail.com 1
mnp@gmail.com 1
Name: date, dtype: int32
df2['status'] = df2['email_id'].str.lower().map(s).fillna(df2['status']).astype(int)
print (df2)
email_id status
0 abc@gmail.com 1
1 def@gmail.com 0
2 ghi@gmail.com 1
3 mno@gmail.com 3
4 pqr@gmail.com 2
5 MNP@gmail.com 1