Search code examples
pythonpandasdataframenan

How do I read NaN (Sodium Nitride) in pandas from csv as a string instead of NaN (Not a Number)?


I am studying material informatics in python. I want to treat NaN (Sodium Nitride) as a chemical formula as a string, but it is taken as NaN (Not a Number).

import pandas as pd

df = pd.read_csv('sample.csv', dtype={'formula': str})
print(df.loc[0]['formula'])
# >> nan
print(type(df.loc[0]['formula']))
# >> float

The csv file to be read is as follows

id,formula
1,NaN
2,NaHCO3

Solution

  • By defaut, read_csv recognizes the following strings as NaN:

    ''
    '#N/A'
    '#N/A N/A'
    '#NA'
    '-1.#IND'
    '-1.#QNAN'
    '-NaN'
    '-nan'
    '1.#IND'
    '1.#QNAN'
    '<NA>'
    'N/A'
    'NA'
    'NULL'
    'NaN'
    'n/a'
    'nan'
    'null'
    

    Use the na_values=[''], keep_default_na=False option:

    df = pd.read_csv('sample.csv', na_values=[''], keep_default_na=False,
                     dtype={'formula': str})