python python-3.x pandas dataframe floating-accuracy

How to compare two CSV files using DataFrames and retrieve the different cells? Why am I getting so many decimals figures in the float cells?

I have a CSV file such as this one (example.csv).

STRING_COL,INT_1,INT_2,FLOAT,INT_3
Hello,9,65151651,3234.54848,7832
This is a string,2,5484651,34.234,-999
Another,2,62189548,51.51658,-999
Test,2,2131514,5.2156,-999
Ham,9,6546548,2.15,-999
String,9,3216546,2.15468,-999

Every cell has a different number of decimals. They can be strings as well or integers (Int64, Int8, ...). Then I also have a similar CSV, but with some values changed. I want to check de differences between both files.

Therefore I have written a code similar to this one to compare the values cell by cell:

import pandas as pd

df = pd.read_csv(
    'example.csv', delimiter=',', comment='#', skip_blank_lines=True,
    verbose=False, engine='python', dtype=str
)
df = df.apply(lambda x: pd.to_numeric(x, errors='ignore', downcast='integer'))

df_2 = pd.read_csv(
    'example_2.csv', delimiter=',', comment='#', skip_blank_lines=True,  # file with small changes
    verbose=False, engine='python', dtype=str
)
df_2 = df_2.apply(lambda x: pd.to_numeric(x, errors='ignore', downcast='integer'))

for i in list(df.index):
    for column in list(df.columns):
        old = df.loc[i, column]
        new = df_2.loc[i, column]
        if old != new:
            print('DIFFERENT VALUE >> INDEX: {} | OLD: {} | NEW: {}'.format(i, old, new))

If you run this example with the small CSV file I am pretty sure it is going to work well. But with a huge CSV file some weird things are happening. I do not understand why sometimes many values are truncated to this ones:

1.6440000000000001  >> original value 1.644
7.7189999999999985  >> original value 7.7189

Then if I compare them it finds that they are different and this is not true because the value is the same. What is happening? Is there a way to fix this? Is there a better way to compare values with DataFrames?

NOTE: Maybe I am doing anything wrong in other part of my original code, but I think I have written the most important and relevant.

NOTE 2: I take into account that the != operator does not work well with NaN values. I use np.isnan to check this changes.

Update. I do not need to compare and say "yes, it is equal" and "no, it is not equal". I need to retrieve the values that have changes cell by cell.

Solution

Finally I found something to make the comparison in an appropriate way: np.isclose(). I have read the duplicated question I found and some other questions about the epsilon value: numpy.finfo(), epsilon

Epsilon: Numbers which differ by less than machine epsilon are numerically the same

    abs(a - b) < epsilon
    absolute(a - b) <= (atol + rtol * absolute(b))      # np.isclose() method

So I need to make something like this. I have to check what happens if I am comparing between float32 and float64 or float16

eps64 = np.finfo(np.float64).eps
for col in df.columns:
    np.isclose(
        df[col],
        df_2[col],
        equal_nan=False,
        atol=0.0,
        rtol=eps64
    )

But now I am facing the problem that if I want copy the value to other variable I copy the inaccurate value 1.6440000000000001. What I am doing now to fix this is to cast the value to float >> float(1.6440000000000001)