Search code examples
pythonpandasfillna

pd.fillna(pd.Series()) can't fill all NaN values


I want to fill the NaNs in a dataframe with random values:

df1 = pd.DataFrame(list(zip(['0001', '0001', '0002', '0003', '0004', '0004'],
                            ['a', 'b', 'a', 'b', 'a', 'b'],
                           ['USA', 'USA', 'USA', 'USA', 'USA', 'USA'],
                           [np.nan, np.nan, 'Jan', np.nan, np.nan, 'Jan'],
                           [1,2,3,4,5,6])),
                    columns=['sample ID', 'compound', 'country', 'month', 'value'])
df1

Out:

    sample ID   compound    country month   value
0   0001          a           USA   NaN      1
1   0001          b           USA   NaN      2
2   0002          a           USA   Jan      3
3   0003          b           USA   NaN      4
4   0004          a           USA   NaN      5 
5   0004          b           USA   Jan      6

I slice the database based on the compound column:

df2 = df1.loc[df1.compound == 'a']
df2

Out:

  sample ID  compound   country month   value
0   0001      a           USA   NaN      1
2   0002      a           USA   Jan      3
4   0004      a           USA   NaN      5

Then I tried to fillna with non-repeated values using filler:

from numpy.random import default_rng

rng = default_rng()
filler = rng.choice(len(df2.month), size=len(df2.month), replace=False)
filler = pd.Series(-abs(filler))

df2.month.fillna(filler, inplace=True)
df2

Out:

   sample ID    compound    country month   value
0   0001           a         USA    -1.0    1
2   0002           a         USA    Jan     3
4   0004           a         USA    NaN     5 

I expected no NaN in the out but actually not, Why?


Solution

  • Problem is that your filler index is different from df2, since df2 is part of df1 by boolean indexing, you can do

    filler = pd.Series(-abs(filler)).set_axis(df2.index)
    df2['month'].fillna(filler, inplace=True)