Search code examples
pythonpython-3.xpandascompare

Compare two Dataframe columns and retrive the mismatched columns in a new column


I am trying to compare two dataframes and create one column in the first dataframe says that mismatched columns. attaching the Dataframe images and expected output images below. Could I get some help here

enter image description here attaching the expected outputs here.Need one column says that names of all mismatched columns


Solution

  • Assuming aligned indices, you can use:

    df1['mismatch'] = df1.ne(df2).dot(df1.columns+',').str[:-1]
    

    Alternatively:

    out = (df1.ne(df2).reset_index().melt('index')
           .query('value').groupby('index')
           ['variable'].agg(','.join)
          )
    
    df1['mismatch'] = out
    

    Example output:

       A  B  C mismatch
    0  1  2  3      A,B
    1  4  5  6        A
    2  7  8  9         
    

    Used inputs:

    # df1
       A  B  C
    0  1  2  3
    1  4  5  6
    2  7  8  9
    
    # df2
       A  B  C
    0  0  0  3
    1  0  5  6
    2  7  8  9