Search code examples
pythonpandasdataframenanna

Converting all <NA> into nan in pandas dataframe - python


I have a pandas dataframe with floats and strings and few <NA> and nan. I am trying to locate all the <NA> and convert them into nan using the following function pd.to_numeric(....., errors='coerce'). Also making sure that floats and strings remain untouched. Could you please help me with it? Also when I df.to_dict('list') I get few '' and <NA>. Thank you.


Solution

  • Whether you have string <NA> or pandas NA (pd.NA), both should be converted to nan using pd.to_numeric:

    df = pd.DataFrame({'col': [1, '1', 'a', float('nan'), pd.NA, '<NA>']})
    
    pd.to_numeric(df['col'], errors='coerce')
    

    Output:

    0    1.0
    1    1.0
    2    NaN
    3    NaN
    4    NaN
    5    NaN
    Name: col, dtype: float64
    

    If you want to replace specific items, keeping the strings, rather use:

    df['col'].replace({'<NA>': float('nan'), pd.NA: float('nan')})
    

    Output:

    0      1
    1      1
    2      a
    3    NaN
    4    NaN
    5    NaN
    Name: col, dtype: object
    

    If you need string representations of number as numbers, and other strings intact while removing <NA>/pd.NA:

    out = (pd.to_numeric(df['col'], errors='coerce')
             .fillna(df['col'].replace({'<NA>': float('nan'), pd.NA: float('nan')}))
           )
    

    Output:

    0    1.0
    1    1.0
    2      a
    3    NaN
    4    NaN
    5    NaN
    Name: col, dtype: object