Search code examples
pythonpandascsvfloating-pointrounding

Pandas read csv file with float values results in weird rounding and decimal digits


I have a csv file containing numerical values such as 1524.449677. There are always exactly 6 decimal places.

When I import the csv file (and other columns) via pandas read_csv, the column automatically gets the datatype object. My issue is that the values are shown as 2470.6911370000003 which actually should be 2470.691137. Or the value 2484.30691 is shown as 2484.3069100000002.

This seems to be a datatype issue in some way. I tried to explicitly provide the data type when importing via read_csv by giving the dtype argument as {'columnname': np.float64}. Still the issue did not go away.

How can I get the values imported and shown exactly as they are in the source csv file?


Solution

  • Pandas uses a dedicated dec 2 bin converter that compromises accuracy in preference to speed.

    Passing float_precision='round_trip' to read_csv fixes this.

    Check out this page for more detail on this.

    After processing your data, if you want to save it back in a csv file, you can pass
    float_format = "%.nf" to the corresponding method.

    A full example:

    import pandas as pd
    
    df_in  = pd.read_csv(source_file, float_precision='round_trip')
    df_out = ... # some processing of df_in
    df_out.to_csv(target_file, float_format="%.3f") # for 3 decimal places