Here is an exctract of my dataset : Dataset
Here is an exemple of dataset
df = pd.DataFrame(
{'vals': np.where(np.arange(35) < 30, np.arange(35), np.nan)},
index=pd.date_range('2021-01-01', freq='12H', periods=35))
vals
2021-01-01 00:00:00 0.0
2021-01-01 12:00:00 1.0
2021-01-02 00:00:00 2.0
2021-01-02 12:00:00 3.0
2021-01-03 00:00:00 4.0
2021-01-03 12:00:00 5.0
2021-01-04 00:00:00 6.0
2021-01-04 12:00:00 7.0
2021-01-05 00:00:00 8.0
2021-01-05 12:00:00 9.0
2021-01-06 00:00:00 10.0
2021-01-06 12:00:00 11.0
2021-01-07 00:00:00 12.0
2021-01-07 12:00:00 13.0
2021-01-08 00:00:00 14.0
2021-01-08 12:00:00 15.0
2021-01-09 00:00:00 16.0
2021-01-09 12:00:00 17.0
2021-01-10 00:00:00 18.0
2021-01-10 12:00:00 19.0
2021-01-11 00:00:00 20.0
2021-01-11 12:00:00 21.0
2021-01-12 00:00:00 22.0
2021-01-12 12:00:00 23.0
2021-01-13 00:00:00 24.0
2021-01-13 12:00:00 25.0
2021-01-14 00:00:00 26.0
2021-01-14 12:00:00 27.0
2021-01-15 00:00:00 28.0
2021-01-15 12:00:00 29.0
2021-01-16 00:00:00 NaN
2021-01-16 12:00:00 NaN
2021-01-17 00:00:00 NaN
2021-01-17 12:00:00 NaN
2021-01-18 00:00:00 NaN
and for the result i would like to have :
vals
2021-01-01 00:00:00 0.0
2021-01-01 12:00:00 1.0
2021-01-02 00:00:00 2.0
2021-01-02 12:00:00 3.0
2021-01-03 00:00:00 4.0
2021-01-03 12:00:00 5.0
2021-01-04 00:00:00 6.0
2021-01-04 12:00:00 7.0
2021-01-05 00:00:00 8.0
2021-01-05 12:00:00 9.0
2021-01-06 00:00:00 10.0
2021-01-06 12:00:00 11.0
2021-01-07 00:00:00 12.0
2021-01-07 12:00:00 13.0
2021-01-08 00:00:00 14.0
2021-01-08 12:00:00 15.0
2021-01-09 00:00:00 16.0
2021-01-09 12:00:00 17.0
2021-01-10 00:00:00 18.0
2021-01-10 12:00:00 19.0
2021-01-11 00:00:00 20.0
2021-01-11 12:00:00 21.0
2021-01-12 00:00:00 22.0
2021-01-12 12:00:00 23.0
2021-01-13 00:00:00 24.0
2021-01-13 12:00:00 25.0
2021-01-14 00:00:00 26.0
2021-01-14 12:00:00 27.0
2021-01-15 00:00:00 28.0
2021-01-15 12:00:00 29.0
2021-01-16 00:00:00 16.0
2021-01-16 12:00:00 17.0
2021-01-17 00:00:00 18.0
2021-01-17 12:00:00 19.0
2021-01-18 00:00:00 20.0
My question:
I would like to fill NaN value with a value from the same column that has been observed a week before.
df.fillna(method='ffill') is not helping as it fills based on the last value. Any idea?
Simple example of a DataFrame with missing values and a datetime index:
In [2]: df = pd.DataFrame(
...: {'vals': np.where(np.arange(21) < 14, np.arange(21), np.nan)},
...: index=pd.date_range('2021-01-01', freq='D', periods=21),
...: )
...:
In [3]: df
Out[3]:
vals
2021-01-01 0.0
2021-01-02 1.0
2021-01-03 2.0
2021-01-04 3.0
2021-01-05 4.0
2021-01-06 5.0
2021-01-07 6.0
2021-01-08 7.0
2021-01-09 8.0
2021-01-10 9.0
2021-01-11 10.0
2021-01-12 11.0
2021-01-13 12.0
2021-01-14 13.0
2021-01-15 NaN
2021-01-16 NaN
2021-01-17 NaN
2021-01-18 NaN
2021-01-19 NaN
2021-01-20 NaN
2021-01-21 NaN
You can group on the week day using pandas datetime components, then use ffill to forward fill within each group:
In [4]: df.groupby(df.index.weekday).ffill()
Out[4]:
vals
2021-01-01 0.0
2021-01-02 1.0
2021-01-03 2.0
2021-01-04 3.0
2021-01-05 4.0
2021-01-06 5.0
2021-01-07 6.0
2021-01-08 7.0
2021-01-09 8.0
2021-01-10 9.0
2021-01-11 10.0
2021-01-12 11.0
2021-01-13 12.0
2021-01-14 13.0
2021-01-15 7.0
2021-01-16 8.0
2021-01-17 9.0
2021-01-18 10.0
2021-01-19 11.0
2021-01-20 12.0
2021-01-21 13.0