Search code examples
pythonarrayspandascomparison

Comparing two data frames with a given tolerance range


I have two dataFrame with same number of columns and same rows size. I'm comparing the the first row in df1 with the first row in df2, and the 2nd row in df1 with the 2nd row in df2 and so on, to see how many feature differences are there. This code is working ok but it cosiders the exact match.

Df1:

    var1  var2  var 3
1   30     65    100
2   40     32    200
3   25     64    500

Df2:

    var1  var2  var 3
1   30     65    100
2   80     77    50
3   22     60    499
In: differences = np.zeros(len(df1))
    for i in df1:
    differences += np.where(df1[i]!=df2[i],1,0)
    print(differences)

the output is an array that returns the number of differences between each row :

In: print(differences)
         [0. 3. 3.]

All good but, i want to take into account the tolerance range when we comparing the value. So, the values not have to be exactly the same, i would add a tolerance range of 5. So, if the value in df1 is 25 and value in df2 is 22, so it should be the same. the desired output is:

In: print(differences)
         [0. 3. 0.]

because if we look at third row in df1 and df2, the values fall within a tolerance range if 5. Any idea to implement this?


Solution

  • Try using np.isclose() :

    differences = np.zeros(len(df1))
    for i in df1:
        differences += np.where(~np.isclose(df1[i],df2[i],atol = 5),1,0)
    print(differences)
    

    Output:

    [0. 3. 0.]