Search code examples
pythonpandasdataframelookup

how to lookup between two dataframes and update the value in another df


I have a dataframe df1 like below -

|email_id| date |
|abc@gmail.com | ['2022-04-09'] |
|ABC@gmail.com | [nan]
|def@gmail.com | ['2022-09-21','2022-03-09'] |
|ghi@gmail.com | [nan, '2022-03-29'] |
|jkl@gmail.com | [nan] |
|mnp@gmail.com | [nan,'2022-09-01']

Another df df2 -

|email_id| status |
|abc@gmail.com | 0 |
|def@gmail.com | 0 |
|ghi@gmail.com | 0 |
|mno@gmail.com | 3 |
|pqr@gmail.com | 2 |
|MNP@gmail.com | 1 |

How can I lookup email_id from df2 in df1 and update the status in df2? If we have the date values present in df1 date column , status for that email_id should be 0, and if we have any nan values present, the status should be 1. If some email_id from df2 doesn't match in df1 , will keep the status as same.

Expected output of df2 -

|email_id| status |
|abc@gmail.com | 1 |
|def@gmail.com | 0 |
|ghi@gmail.com | 1 |
|mno@gmail.com | 3 |
|pqr@gmail.com | 2 |
|MNP@gmail.com | 1 |

Please help me out. Thanks in advance!


Solution

  • First use DataFrame.explode for column from lists, then create compare for missing values with aggregate max for mapping Series, use Series.map with replace non matched values to original column df2['status']:

    df = df1.explode('date')
    s = df['date'].isna().astype(int).groupby(df['email_id'].str.lower()).max()
    print (s)
    email_id
    abc@gmail.com    1
    def@gmail.com    0
    ghi@gmail.com    1
    jkl@gmail.com    1
    mnp@gmail.com    1
    Name: date, dtype: int32
    
    df2['status'] = df2['email_id'].str.lower().map(s).fillna(df2['status']).astype(int)
    print (df2)
            email_id  status
    0  abc@gmail.com       1
    1  def@gmail.com       0
    2  ghi@gmail.com       1
    3  mno@gmail.com       3
    4  pqr@gmail.com       2
    5  MNP@gmail.com       1