Search code examples
pythonpandasdataframecomparisonmismatch

column comparison of two dataframe, return df with mismatches python


I want to print two dataframes that print the rows where there is a mismatch in a given column, here the "second_column": "first_column" is a key value that identify same product in both dataframes

    import pandas as pd
    
    data1 = {
        'first_column': ['id1', 'id2', 'id3'],
        'second_column': ['1', '2', '2'],
        'third_column': ['1', '2', '2'],
        'fourth_column': ['1', '2', '2']
    }
    
    df1 = pd.DataFrame(data1)
    
    print(df1)
    
    test = df1['second_column'].nunique()
    
    data2 = {
        'first_column': ['id1', 'id2', 'id3'],
        'second_column': ['3', '4', '2'],
        'third_column': ['1', '2', '2'],
        'fourth_column': ['1', '2', '2']
    }
    
    df2 = pd.DataFrame(data2)
    print(df2)

enter image description here

expected output:

enter image description here


Solution

  • IIUC btw, you screenshots don't match your DF definition

    df1.loc[~df1['second_column'].isin(df2['second_column'])]
    
    
    first_column    second_column   third_column    fourth_column
    0   1   1   1   1
    
    df2.loc[~df2['second_column'].isin(df1['second_column'])]
    
    first_column    second_column   third_column    fourth_column
    0   1   3   1   1
    1   2   4   2   2