Search code examples
pythonpandascsvisin

How to use the pandas 'isin' function to give actual values of the df row instead of a boolean expression?


I have two dataframes and I'm comparing their columns labeled 'B'. If the value of column B in df2 matches the value of column B in df1, I want to extract the value of column C from df2 and add it to a new column in df1. Example:

df1

enter image description here

df2

enter image description here

Expected Result of df1:

enter image description here

I've tried the following. I know that this checks if there's a match of column B in both the dataframes - it returns a boolean value of True/False in the 'New' column. Is there a way to extract the value indicated under column 'C' when there's a match and add it to the 'New' column in df1 instead of the boolean values?

df1 = pd.read_csv('df1.csv')
df2 = pd.read_csv('df2.csv')

df1['New'] = df2['B'].isin(df1['B'])


Solution

  • import pandas as pd
    
    df1 = pd.DataFrame({'B': ['a', 'b', 'f', 'd', 'h'], 'C':[1, 5, 777, 10, 3]})
    df2 = pd.DataFrame({'B': ['k', 'l', 'f', 'j', 'h'], 'C':[0, 9, 555, 15, 1]})
    ind = df2[df2['B'].isin(df1['B'])].index
    df1.loc[ind, 'new'] = df2.loc[ind, 'C']
    

    df2

       B    C
    0  k    0
    1  l    9
    2  f  555
    3  j   15
    4  h    1
    

    Output df1

       B    C    new
    0  a    1    NaN
    1  b    5    NaN
    2  f  777  555.0
    3  d   10    NaN
    4  h    3    1.0
    

    Here in ind are obtained indexes of rows df2 where there are matches. Further using loc, where on the left are the row indices, on the right are the column names.