Search code examples
pythonpython-3.xpandasdataframepycharm

Comparing two dataframes and only showing the unmatched column records


I have a list of dictionaries converted to Pandas Dataframe I am able to print the unmatched record but I don't want the whole record but only the mismatched column of the record.

{'A':5,'B':6,'C': 7}] list 2 is: [{'A':5,'B':8,'C': 7}],. I want to get only output of B where the mismatch is. Assuming list of dictionaries will have multiple dictionaries. I have two dataframes and am comparing to find the non matched records.

I need to get an idea on how to do that

Possible solutions tried:-

Finding the common records and dropping from dataframe but I am getting entire rows.

However I only need columns only that have unmatched values.

Please note there are around 50 columns

for df1

         Date   Fruit   Num   Color
0  2013-11-24  Banana  22.1  Yellow
1  2013-11-24  Orange   8.6  Orange
2  2013-11-24   Apple   7.6   Green
3  2013-11-24  Celery  10.2   Green

for df2

         Date   Fruit   Num   Color
0  2013-11-24  Banana  22.1  Orange
1  2013-11-24  Orange   8.6  Orange
2  2013-11-24   Apple   7.6   Green
3  2013-11-24  Celery  10.2   Green

for df_diff

Color
1 Orange

Solution

  • So, given the following dictionaries:

    dict_1 = {
        "Date": {0: "2013-11-24", 1: "2013-11-24", 2: "2013-11-24", 3: "2013-11-24"},
        "Fruit": {0: "Banana", 1: "Orange", 2: "Apple", 3: "Celery"},
        "Num": {0: 22.1, 1: 8.6, 2: 7.6, 3: 10.2},
        "Color": {0: "Yellow", 1: "Orange", 2: "Green", 3: "Green"},
    }
    
    dict_2 = {
        "Date": {0: "2013-11-03", 1: "2013-11-24", 2: "2013-11-24", 3: "2013-11-24"},
        "Fruit": {0: "Banana", 1: "Orange", 2: "Citrus", 3: "Celery"},
        "Num": {0: 22.1, 1: 2.2, 2: 7.6, 3: 0.2},
        "Color": {0: "Orange", 1: "Orange", 2: "Green", 3: "Green"},
    }
    

    You could find the differences like this:

    diff_dict = {}
    for outer_key, inner_dict in dict_1.items():
        diff_dict[outer_key] = {}
        for inner_key, inner_value in inner_dict.items():
            if (other_value := dict_2[outer_key][inner_key]) != inner_value:
                diff_dict[outer_key][inner_key] = other_value
            else:
                diff_dict[outer_key][inner_key] = "-"
    

    And then visualize them with Pandas:

    import pandas as pd
    
    print(pd.DataFrame(diff_dict))
    # Output
    
             Date   Fruit  Num   Color
    0  2013-11-03       -    -  Orange
    1           -       -  2.2       -
    2           -  Citrus    -       -
    3           -       -  0.2       -