Search code examples
pythonpandascsvnanconverters

How to convert non-numeric entries to NaN in read_csv


I am reading in a file with:

pd.read_csv("file.csv", dtype={'ID_1':float})

The file looks like:

ID_0, ID_1,ID_2
a,002,c
b,004,d
c,   ,e       
n,003,g

Unfortunately read_csv fails complaining it can't convert ' ' to a float.

What is the right way to read in a csv and convert anything that can't be converted to a float into NaN?


Solution

  • If you don't specify the dtype param and pass skipinitialspace=True then it will just work:

    In [4]:
    t="""ID_0,ID_1,ID_2
    a,002,c
    b,004,d
    c,   ,e
    n,003,g"""
    
    pd.read_csv(io.StringIO(t), skipinitialspace=True)
    Out[4]:
      ID_0  ID_1 ID_2
    0    a   2.0    c
    1    b   4.0    d
    2    c   NaN    e
    3    n   3.0    g
    

    So in your case:

    pd.read_csv("file.csv", skipinitialspace=True)
    

    will just work

    You can see that the dtypes are as expected:

    In [5]:
    pd.read_csv(io.StringIO(t), skipinitialspace=True).info()
    
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 4 entries, 0 to 3
    Data columns (total 3 columns):
    ID_0    4 non-null object
    ID_1    3 non-null float64
    ID_2    4 non-null object
    dtypes: float64(1), object(2)
    memory usage: 176.0+ bytes