Search code examples
pythondataframecomparisonmissing-data

how to print missing rows of a dataframe versus another


I want to find rows that are missing in the first dataframe.

So I have two dataframes:

data1 = {'first_column':  ['2', '3'],
        'second_column': ['2', '2'],
       'third_column':['2', '1'],
        }

data2 = {'first_column':  ['2', '2'],
        'second_column': ['2', '2'],
       'third_column':['2', '2'],
        }

df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)

expected output:

first_column second_column third_column
           3             2            1

Solution

  • If I understood your requirement properly then, One way to do it using drop_duplicates and merge. Finally, pick only the left_only.

    import pandas as pd
    data1 = {'first_column':  ['2', '3'],
            'second_column': ['2', '2'],
           'third_column':['2', '1'],
            }
    
    data2 = {'first_column':  ['2', '2'],
            'second_column': ['2', '2'],
           'third_column':['2', '2'],
            }
    
    df1 = pd.DataFrame(data1)
    df2 = pd.DataFrame(data2)
    
    df = df1.drop_duplicates().merge(df2.drop_duplicates(), on=df2.columns.to_list(), 
                       how='left', indicator=True)
    final_result = df.loc[df._merge=='left_only',df.columns!='_merge']
    print(final_result.reset_index())
    

    Output:

      first_column second_column third_column
    0            3             2            1