I am running Python 3.5, and Pandas v 0.19.2. I have a dataframe like below. Forward-filling the missing values is straight-forward.
import pandas as pd
import numpy as np
d = {'A': np.array([10, np.nan, np.nan, -3, np.nan, 4, np.nan, 0]),
'B': np.array([np.nan, np.nan, 5, -3, np.nan, np.nan, 0, np.nan ])}
df = pd.DataFrame(d)
df_filled = df.fillna(axis='index', method='ffill')
print(df_filled)
Out[8]:
A B
0 10.0 NaN
1 10.0 NaN
2 10.0 5.0
3 -3.0 -3.0
4 -3.0 -3.0
5 4.0 -3.0
6 4.0 0.0
7 0.0 0.0
My question is: what is the best way to implement a forward fill with decay? I understand the pd.ffill()
and pd.fillna()
do not support this. For instance, the output I am after is the below (in contrast with the regular ffill above), where the value carried over halves at each period:
Out[5]:
A B
0 10.0 NaN
1 5.0 NaN
2 2.5 5.0
3 -3.0 -3.0
4 -1.5 -1.5
5 4.0 -0.75
6 2.0 0.0
7 0.0 0.0
Yes, there's no simple way to do this. I'd recommend doing this one column at a time, using groupby
and apply
.
for c in df:
df[c] = df[c].groupby(df[c].notnull().cumsum()).apply(
lambda y: y.ffill() / 2 ** np.arange(len(y))
)
df
A B
0 10.0 NaN
1 5.0 NaN
2 2.5 5.00
3 -3.0 -3.00
4 -1.5 -1.50
5 4.0 -0.75
6 2.0 0.00
7 0.0 0.00