I'm looking to take a pandas series and fill NaN
with the average of the next numerical value where: average = next numerical value / (# consecutive NaNs + 1)
Here's my code so far, I just can't figure out how to divide the filler
column among the NaN
s (and the next numerical value as well) in num
:
import pandas as pd
dates = pd.date_range(start = '1/1/2016',end = '1/12/2016', freq = 'D')
nums = [10, 12, None, None, 39, 10, 11, None, None, None, None, 60]
df = pd.DataFrame({
'date':dates,
'num':nums
})
df['filler'] = df['num'].fillna(method = 'bfill')
Current Output:
date num filler
0 2016-01-01 10.0 10.0
1 2016-01-02 12.0 12.0
2 2016-01-03 NaN 39.0
3 2016-01-04 NaN 39.0
4 2016-01-05 39.0 39.0
5 2016-01-06 10.0 10.0
6 2016-01-07 11.0 11.0
7 2016-01-08 NaN 60.0
8 2016-01-09 NaN 60.0
9 2016-01-10 NaN 60.0
10 2016-01-11 NaN 60.0
11 2016-01-12 60.0 60.0
Desired Output:
date num
0 2016-01-01 10.0
1 2016-01-02 12.0
2 2016-01-03 13.0
3 2016-01-04 13.0
4 2016-01-05 13.0
5 2016-01-06 10.0
6 2016-01-07 11.0
7 2016-01-08 12.0
8 2016-01-09 12.0
9 2016-01-10 12.0
10 2016-01-11 12.0
11 2016-01-12 12.0
cumsum
of notnull
groupby
and transform
with mean
csum = df.num.notnull()[::-1].cumsum()
filler = df.num.fillna(0).groupby(csum).transform('mean')
df.assign(filler=filler)
date num filler
0 2016-01-01 10.0 10.0
1 2016-01-02 12.0 12.0
2 2016-01-03 NaN 13.0
3 2016-01-04 NaN 13.0
4 2016-01-05 39.0 13.0
5 2016-01-06 10.0 10.0
6 2016-01-07 11.0 11.0
7 2016-01-08 NaN 12.0
8 2016-01-09 NaN 12.0
9 2016-01-10 NaN 12.0
10 2016-01-11 NaN 12.0
11 2016-01-12 60.0 12.0
how it works
df.num.notnull().cumsum()
is a standard technique to find groups of contiguous nulls. However, I wanted my groups to end with the next numeric value. So I reversed the series and then cumsum
'd.transform
to broadcast across the existing indexassign
new column. Despite having reversed the series, the index will realign like magic. Could have used loc
but that overwrites the existing df
. I'll let OP decide to overwrite if they want to.