Search code examples
stringpandasprecisionfloating-accuracy

Pandas : Precision error when converting string to float


Using pandas to deal with timestamps, I am concatening two columns and then convert the result in floating. It appears that when I display the two columns I observe two different results. How can the conversion from string to float can affect the value? Thanks for your help.

Here is the content of the data.csv file

epoch_day,epoch_ns
1533081601,224423000

Here is my test program:

import pandas as pd
pd.options.display.float_format = '{:.10f}'.format
df_mid = pd.read_csv("data.csv")

df_mid['result_1']=df_mid['epoch_day'].astype(str).str.cat(df_mid['epoch_ns'].astype(str), sep =".")
df_mid['result_2'] = df_mid['epoch_day'].astype(str).str.cat(df_mid['epoch_ns'].astype(str), sep =".").astype(float)
print(df_mid)

The result is :

   epoch_day   epoch_ns              result_1              result_2
0  1533081601  224423000  1533081601.224423000 1533081601.2244229317

Thanks for your help

FX


Solution

  • Floating-point numbers are represented in computer hardware as base 2 (binary) fractions. Most decimal fractions cannot be represented exactly as binary fractions.

    When you convert your string, python creates a float which is the closest binary fraction for your input.

    You can actually see to which decimal number this corresponds by running the following:

    from decimal import Decimal
    Decimal(1533081601.224423000)
    OUTPUT: Decimal('1533081601.224422931671142578125')
    

    You can see the Python documentation for more info https://docs.python.org/2/tutorial/floatingpoint.html