Search code examples
pythonnumpypandasnanpython-unicode

Count NaNs when unicode values present


Good morning all,

I have a pandas dataframe containing multiple series. For a given series within the dataframe, the datatypes are unicode, NaN, and int/float. I want to determine the number of NaNs in the series but cannot use the built in numpy.isnan method because it cannot safely cast unicode data into a format it can interpret. I have proposed a work around, but I'm wondering if there is a better/more Pythonic way of accomplishing this task.

Thanks in advance, Myles

import pandas as pd
import numpy as np

test = pd.Series(data = [NaN, 2, u'string'])
np.isnan(test).sum()
#Error

#Work around
test2 = [x for x in test if not(isinstance(x, unicode))]
numNaNs = np.isnan(test2).sum()

Solution

  • Use pandas.isnull:

    In [24]: test = pd.Series(data = [NaN, 2, u'string'])
    
    In [25]: pd.isnull(test)
    Out[25]: 
    0     True
    1    False
    2    False
    dtype: bool
    

    Note however, that pd.isnull also regards None as True:

    In [28]: pd.isnull([NaN, 2, u'string', None])
    Out[28]: array([ True, False, False,  True], dtype=bool)