Search code examples
pythonpandasdataframemerge

Merging two dataframes but based on string match condition


In python, I have two dataframes.

df1 looks like below:

ID      Limit       Comment
12      200         ['Normal']
23      202         ['Emergency']
23      203         ['Normal']
43      304         ['Emergency']
12      500         ['Emergency']
24      100         ['Emergency']

df2 looks like below.

Name          Title      ID     Limit Type
BASE CASE     title1     12      X
NOT_BASE      title12    23      N
NOT_BASE_1    title 23   43      X
BASE CASE     title 23   23      X
NOT_BASE_2    title23    12      X

I want to merge the dataframes on ID, but in df2 where BASE CASE is listed under the Name column, it should list the Limit value listed against Normal in the Comment column. If it is not BASE CASE, then use the limit listed against the Emergency. Therefore, the output should look like this.

df3

Name          Title      ID     Limit Type   Limit
BASE CASE     title1     12      X           200
NOT_BASE      title12    23      N           202
NOT_BASE_1    title 23   43      X           304
BASE CASE     title 23   23      X           203
NOT_BASE_2    title23    12      X           500

Solution

  • assign a new column to serve a secondary key for your merge ("Normal" if "BASE CASE" else "Emergency"):

    import numpy as np
    
    out = (df2.assign(Comment=np.where(df2['Name'].eq('BASE CASE'),
                                       'Normal', 'Emergency'))
              .merge(df1, on=['ID', 'Comment'], how='left')
             #.drop(columns=['Comment']) # optional
           )
    

    Output:

             Name     Title  ID Limit Type    Comment  Limit
    0   BASE CASE    title1  12          X     Normal    200
    1    NOT_BASE   title12  23          N  Emergency    202
    2  NOT_BASE_1  title 23  43          X  Emergency    304
    3   BASE CASE  title 23  23          X     Normal    203
    4  NOT_BASE_2   title23  12          X  Emergency    500
    

    Reproducible inputs:

    df1 = pd.DataFrame({'ID': [12, 23, 23, 43, 12, 24],
                        'Limit': [200, 202, 203, 304, 500, 100],
                        'Comment': ['Normal', 'Emergency', 'Normal', 'Emergency', 'Emergency', 'Emergency']})
    df2 = pd.DataFrame({'Name': ['BASE CASE', 'NOT_BASE', 'NOT_BASE_1', 'BASE CASE', 'NOT_BASE_2'],
                        'Title': ['title1', 'title12', 'title 23', 'title 23', 'title23'],
                        'ID': [12, 23, 43, 23, 12],
                        'Limit Type': ['X', 'N', 'X', 'X', 'X']})