Search code examples
pythonpandasdataframenan

How to test for nan's in an apply function in pandas?


I have a simple apply function that I execute on some of the columns. But, it keeps getting tripped up by NaN values in pandas.

input_data = np.array(
[
[random.randint(0,9) for x in range(2)]+['']+['g'],
[random.randint(0,9) for x in range(3)]+['g'],
[random.randint(0,9) for x in range(3)]+['a'],
[random.randint(0,9) for x in range(3)]+['b'],
[random.randint(0,9) for x in range(3)]+['b']
]
)

input_df = pd.DataFrame(data=input_data, columns=['B', 'C', 'D', 'label'])

I have a simple lambda like this:

input_df['D'].apply(lambda aCode: re.sub('\.', '', aCode) if not np.isnan(aCode) else aCode)

And it gets tripped up by the NaN values:

File "<pyshell#460>", line 1, in <lambda>
    input_df['D'].apply(lambda aCode: re.sub('\.', '', aCode) if not np.isnan(aCode) else aCode)
TypeError: Not implemented for this type

So, I tried just testing for nan values that Pandas adds:

np.isnan(input_df['D'].values[0])
np.isnan(input_df['D'].iloc[0])

Both get the same error.

I do not know how to test for nan values other than np.isnan. Is there an easier way to do this? Thanks.


Solution

  • your code fails because your first entry is an empty string and np.isnan doesn't understand empty strings:

    In [55]:
    input_df['D'].iloc[0]
    
    Out[55]:
    ''
    
    In [56]:
    np.isnan('')
    
    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    <ipython-input-56-a9f139a0c5b8> in <module>()
    ----> 1 np.isnan('')
    
    TypeError: Not implemented for this type
    

    pd.notnull does work:

    In [57]:
    import re
    input_df['D'].apply(lambda aCode: re.sub('\.', '', aCode) if pd.notnull(aCode) else aCode)
    
    Out[57]:
    0     
    1    3
    2    3
    3    0
    4    3
    Name: D, dtype: object
    

    However, if you just want to replace something then just use .str.replace:

    In [58]:
    input_df['D'].str.replace('\.','')
    
    Out[58]:
    0     
    1    3
    2    3
    3    0
    4    3
    Name: D, dtype: object