Search code examples
pythonpandasif-statementmatchingdata-comparison

Loop over columns to compare two columns at a time in a dataframe and add one summary column based on each comparison result


df1.csv

df2.csv

After executing the code below,

df1.merge(df2, on=['Customer_id'], indicator='Status', how='outer')

I get,

df3.csv

Now I want to compare whether all elements in Name_x is equal to Name_y or not, similarly for other columns as well. What i am doing is comparing it individually. Here is my code:

df3.assign(Status1=df3.Name_x == df3.Name_y)

df3.assign(Status2=df3.Age_x == df3.Age_y)

and so on

After doing above i get,

df4.csv

Can i write for loop so that i do not have to write the code

df3.assign()

individually for comparing every two columns?

and most importantly my final status column should look like this,

output

what i am doing is thinking to write if elif like this:

def f(row):

if row['Status1'] == 'True' and row['Status2'] == 'False':
    val = 'Matching for column {Name}'
elif row['Status1'] == 'False' and row['Status1'] == 'True':
    val = 'Matching for column {Age}
elif row['Status1'] == 'False' and row['Status1'] == 'False':
    val = 'Not Matching at all' #and so on...
return val

This is okay if one want to write if elif only on two columns Status1 and Status2 and becomes lengthy if one want to write if elif on more than 2 columns. Can i get the desired output in few lines of code if i have column Status3 also?


Solution

  • Use pandas.merge

    Read both csv into pandas

    df1=pd.Dataframe.from_csv('file')
    df2=pd.Dataframe.from_csv('file2')
    

    then Merge using an "outer" join

    df3=pd.merge(df1,df2,on=[key1,key2 etc], how="outer", indicator=True) 
    

    At this point you should be 90% done