I want to fill the NaN
s in a dataframe with random values:
df1 = pd.DataFrame(list(zip(['0001', '0001', '0002', '0003', '0004', '0004'],
['a', 'b', 'a', 'b', 'a', 'b'],
['USA', 'USA', 'USA', 'USA', 'USA', 'USA'],
[np.nan, np.nan, 'Jan', np.nan, np.nan, 'Jan'],
[1,2,3,4,5,6])),
columns=['sample ID', 'compound', 'country', 'month', 'value'])
df1
Out:
sample ID compound country month value
0 0001 a USA NaN 1
1 0001 b USA NaN 2
2 0002 a USA Jan 3
3 0003 b USA NaN 4
4 0004 a USA NaN 5
5 0004 b USA Jan 6
I slice the database based on the compound
column:
df2 = df1.loc[df1.compound == 'a']
df2
Out:
sample ID compound country month value
0 0001 a USA NaN 1
2 0002 a USA Jan 3
4 0004 a USA NaN 5
Then I tried to fillna
with non-repeated values using filler
:
from numpy.random import default_rng
rng = default_rng()
filler = rng.choice(len(df2.month), size=len(df2.month), replace=False)
filler = pd.Series(-abs(filler))
df2.month.fillna(filler, inplace=True)
df2
Out:
sample ID compound country month value
0 0001 a USA -1.0 1
2 0002 a USA Jan 3
4 0004 a USA NaN 5
I expected no NaN
in the out but actually not, Why?
Problem is that your filler
index is different from df2
, since df2
is part of df1
by boolean indexing, you can do
filler = pd.Series(-abs(filler)).set_axis(df2.index)
df2['month'].fillna(filler, inplace=True)