Search code examples
pythonpandasdataframecomparison

one to one column-value comparison between 2 dataframes - pandas


I have 2 dataframe -

print(d)
   Year  Salary  Amount  Amount1  Amount2
0  2019    1200      53       53       53
1  2020    3443     455      455      455
2  2021    6777     123      123      123
3  2019    5466     313      313      313
4  2020    4656     545      545      545
5  2021    4565     775      775      775
6  2019    4654     567      567      567
7  2020    7867     657      657      657
8  2021    6766     567      567      567

print(d1)

   Year  Salary  Amount  Amount1  Amount2
0  2019    1200      53       73       63
import pandas as pd

d = pd.DataFrame({
    'Year': [
                2019,
                2020,
                2021,
            ] * 3,
    'Salary': [
        1200,
        3443,
        6777,
        5466,
        4656,
        4565,
        4654,
        7867,
        6766
    ],
    'Amount': [
        53,
        455,
        123,
        313,
        545,
        775,
        567,
        657,
        567
    ],
    'Amount1': [
        53,
        455,
        123,
        313,
        545,
        775,
        567,
        657,
        567
    ], 'Amount2': [
        53,
        455,
        123,
        313,
        545,
        775,
        567,
        657,
        567
    ]
})

d1 = pd.DataFrame({
    'Year': [
        2019
    ],
    'Salary': [
        1200
    ],
    'Amount': [
        53
    ],
    'Amount1': [
        73
    ], 'Amount2': [
        63
    ]
})

I want to compare the 'Salary' value of dataframe d1 i.e. 1200 with all the values of 'Salary' in dataframe d and set a count if it is >= or < (a Boolean comparison) - this is to be done for all the columns(amount, amount1, amount2 etc), if the value in any column of d1 is NaN/None, no comparison needs to be done. The name of the columns will always be same so it is basically one to one column comparison.

My approach and thoughts - I can get the values of d1 in a list by doing -

l = []
for i in range(len(d1.columns.values)):
    if i == 0:
        continue
    else:
        num = d1.iloc[0, i]
        l.append(num)
print(l)

# list comprehension equivalent

lst = [d1.iloc[0, i] for i in range(len(d1.columns.values)) if i != 0]


[1200, 53, 73, 63]

and then use iterrows to iterate over all the columns and rows in dataframe d OR I can iterate over d and then perform a similar comparison by looping over d1 - but these would be time consuming for a high dimensional dataframe(d in this case). What would be the more efficient or pythonic way of doing it?


Solution

  • IIUC, you can do:

    (df1 >= df2.values).sum()
    

    Output:

    Year       9
    Salary     9
    Amount     9
    Amount1    8
    Amount2    8
    dtype: int64