Search code examples
pythonpandasdataframelogical-operators

Finding similar rows in two dataframes using pandas


I have two data frames, The first one is the root data frame, second one is obtained from first data frame (which is based on a pattern that "Name" must be repeated 3 times and "Subset" must of the pattern as shown in dataframe 2 below).

Based on these two dataframes, i need to add a "Remark" column to root dataframe and the column writes "Yes" is particular row in dataframe 1 is present in dataframe 2, else it should be blank.

DataFrame1-

Name    Subset            Value
A       67-A-5678          14
A       58-ABC-87555       187
A       45-ASH-87954       5465
T       78-A-8793          4533
T       52-O-5642          5644
S       34-A-8785          454
S       58-ASO-98978       54
S       23-ASH-87895       784
X       98-X-87876         455
X       87-ABC-54578       4545
X       56-ASH-89667       854
Y       09-D-98644         45
Y       87-ABC-78834       98
Y       87-ASH-87455A      4566
L       67-A-87545         78
L       89-GHS-08753       12 
L       78-PHU-09876       655 

DataFrame2-

Name    Subset             Value
A       67-A-5678          14       
A       58-ABC-87555       187      
A       45-ASH-87954       5465     
X       98-X-87876         455
X       87-ABC-54578       4545
X       56-ASH-89667       854
Y       09-D-98644         45
Y       87-ABC-78834       98
Y       87-ASH-87455A      4566

Output Dataframe-

Name    Subset            Value    Remark
A       67-A-5678          14      Yes
A       58-ABC-87555       187     Yes 
A       45-ASH-87954       5465    Yes
T       78-A-8793          4533
T       52-O-5642          5644
S       34-A-8785          454
S       58-ASO-98978       54
S       23-ASH-87895       784
X       98-X-87876         455     Yes
X       87-ABC-54578       4545    Yes
X       56-ASH-89667       854     Yes
Y       09-D-98644         45      Yes
Y       87-ABC-78834       98      Yes
Y       87-ASH-87455A      4566    Yes
L       67-A-87545         78
L       89-GHS-08753       12 
L       78-PHU-09876       655

P.S. Actual dataset can be of many columns/rows.


Solution

  • You can merge with indicator=True and use the "both" property to get the matching rows:

    (df1.merge(df2, on=list(df1.columns), how='left', indicator=True)
        .assign(**{'Remark Added': lambda d: d['_merge'].eq('both').map({True: 'Yes', False: ''})})
        .drop(columns='_merge')
    )
    
    

    NB. to see how this works, comment the .drop(columns='_merge') line to keep the temporary _merge column

    output:

       Name         Subset  Value Remark Added
    0     A      67-A-5678     14          Yes
    1     A   58-ABC-87555    187          Yes
    2     A   45-ASH-87954   5465          Yes
    3     T      78-A-8793   4533             
    4     T      52-O-5642   5644             
    5     S      34-A-8785    454             
    6     S   58-ASO-98978     54             
    7     S   23-ASH-87895    784             
    8     X     98-X-87876    455          Yes
    9     X   87-ABC-54578   4545          Yes
    10    X   56-ASH-89667    854          Yes
    11    Y     09-D-98644     45          Yes
    12    Y   87-ABC-78834     98          Yes
    13    Y  87-ASH-87455A   4566          Yes
    14    L     67-A-87545     78             
    15    L   89-GHS-08753     12             
    16    L   78-PHU-09876    655