Search code examples
pythonpandasdataframenotnull

Why does df.where() not replace all null values?


I have a dataframe with very mixed columns. I am trying to set all occurrences of None or NaN to None.

I am trying the answer to this question: Use None instead of np.nan for null values in pandas DataFrame But the accepted answer does not catch all null instances. Example:

my_array = ['1','2',None,4]
df = pd.DataFrame([my_array], columns=['Morning', 'Midday', 'Evening', 'Night'])
df = df.append({'Midday':'10'}, ignore_index=True)

which results in

  Morning Midday Evening  Night
0       1      2    None    4.0
1     NaN     10     NaN    NaN

Applying df.where() to find and replace all null vaules results in:

df.where(df.notnull(), None)

  Morning Midday Evening  Night
0       1      2    None    4.0
1    None     10    None    NaN

But I want output

  Morning Midday Evening  Night
0       1      2    None    4.0
1    None     10    None    None

What am I getting wrong, or is df.where() getting wrong?


Solution

  • You need to change the datatype to object

    out = df.astype(object).where(df.notna(), None)
    Out[392]: 
      Morning Midday Evening Night
    0       1      2    None   4.0
    1    None     10    None  None