Search code examples
python-3.xpandasexport-to-csv

Why is pandas.to_csv dropping numbers when trying to preserve NaN?


Given a pandas dataframe

 df = pd.DataFrame([(290122, 0.20, np.nan),
                    (1900, 1.20, "ABC")],
                     columns = ("number", "x", "class")
                  )

     number    x class
 0   290122  0.2   NaN
 1     1900  1.2   ABC

Then exporting it to a csv, I would like to keep the NaN, e.g. as "NULL" or "NaN",

 df.to_csv("df.csv", encoding="utf-8", index=False, na_rep="NULL")

Yet, opening the csv I get the following: That is, the last two digits of number in the first cell are dropped.

enter image description here Here is the output opened in text editor:

 number,x,class
 2901,0.20,NULL
 1900,1.20,ABC

As mentionned, when dropping the na_rep argument, I obtain as expected:

 number,x,class
 290122,0.20,
 1900,1.20,ABC

Solution

  • Yes this is in fact a bug in pandas 1.0.0. Fixed in 1.0.1. See release notes and https://github.com/pandas-dev/pandas/issues/25099.

    Depending on you data a quick work around could be:

    import numpy as np
    import pandas as pd
    
    na_rep = 'NULL'
    if pd.__version__ == '1.0.0':
        na_rep_wrk = 8 * na_rep
    
    data = [(290122, 0.20, 'NULL'), (2**40 - 1, 3.20, 'NULL'), (1900, 1.20, "ABC")]
    df = pd.DataFrame(data, columns=("number", "x", "class"))
    
    df.to_csv("df.csv", encoding="utf-8", index=False, na_rep=na_rep_wrk)
    df2 = pd.read_csv('df.csv', keep_default_na=False)
    
    assert(np.all(df == df2))
    
    

    This gives the csv file:

    number,x,class
    290122,0.2,NULL
    109951162777,3.2,NULL
    1900,1.2,ABC