I have data from a heat device which is measuring the power consumption over one year.The Data have of course some NaN values (Around 600). I already resampled the data to 15 min time stamps. The Dataframe looks like follows:
timestamp ... Power_consumption
0 2021-06-01 00:15:00 ... 12.7
1 2021-06-01 00:30:00 ... 12.5
2 2021-06-01 00:45:00 ... 12.4
3 2021-06-01 01:00:00 ... 12.3
4 2021-06-01 01:15:00 ... 12.2
5 2021-06-01 00:15:00 ... 12.7
6 2021-06-01 00:30:00 ... 12.5
7 2021-06-01 00:45:00 ... 12.4
8 2021-06-01 01:00:00 ... 13.5
9 2021-06-01 01:15:00 ... 12.2
.
.
.
97 2021-06-02 00:45:00 ... 12.4
98 2021-06-02 01:00:00 ... 12.3
99 2021-06-02 01:15:00 ... 12.2
100 2021-06-02 00:15:00 ... 12.7
101 2021-06-02 00:30:00 ... 12.5
102 2021-06-02 00:45:00 ... NaN
103 2021-06-02 01:00:00 ... 12.5
104 2021-06-02 01:15:00 ... 12.2
I want to fill now e.g. the 2021-06-02 00:45:00 with the value from the day before at the same time (2021-06-01 00:45:00 ) and if this value is also NaN, I would like to take the value from exactly one week before, same time. Whats the best way to do so ?
Initialize the dataframe
df = pd.DataFrame([
["2021-06-01 00:15:00",12.7],
["2021-06-01 00:30:00",12.5],
["2021-06-01 00:45:00",2.4],
["2021-06-01 01:00:00",12.3],
["2021-06-01 01:15:00",2.2],
["2021-06-01 01:30:00",12.5],
["2021-06-01 01:45:00",12.4],
["2021-06-01 02:00:00",13.5],
["2021-06-01 02:15:00",12.2],
["2021-06-02 00:15:00",13.7],
["2021-06-02 00:30:00",13.5],
["2021-06-02 00:45:00",23.4],
["2021-06-02 01:00:00",13.3],
["2021-06-02 01:15:00",np.NaN],
["2021-06-02 01:30:00",np.NaN],
["2021-06-02 01:45:00",13.4],
["2021-06-02 02:00:00",13.5],
["2021-06-02 02:15:00",13.2],
], columns = ["timestamp", "Power_consumption"])
Replacing nans
# converting the column to pandas datetime
df["timestamp"] = pd.to_datetime(df["timestamp"])
# Apply time delta of 1 and 7 for the rows having null values
df["oneday"] = df.loc[df.Power_consumption.isna()].apply(lambda x: x["timestamp"] - timedelta(1), axis=1)
df["oneweek"] = df.loc[df.Power_consumption.isna()].apply(lambda x: x["timestamp"] - timedelta(7), axis=1)
# Set timestamp as index
df = df.set_index("timestamp")
# For the rows where the power consumption is null, oneday column will be set. Query such rows
prev_day_timestamp = df[df.oneday.notna()]["oneday"]
# Query and get the power consumption values of previous date from the dataframe
one_day_values = df.loc[df.index.isin(prev_day_timestamp), "Power_consumption"].reset_index()["Power_consumption"]
# Apply the value to the nan rows
df.loc[df.Power_consumption.isnull(), "Power_consumption"] = one_day_values.tolist()
# Drop the extra columns
df.drop(["oneday"], inplace=True, axis=1)
# Reset the index
df = df.reset_index()
At the end if you find nan, you can apply the same logic again with oneweek column.
Sample output
timestamp Power_consumption
0 2021-06-01 00:15:00 12.7
1 2021-06-01 00:30:00 12.5
2 2021-06-01 00:45:00 2.4
3 2021-06-01 01:00:00 12.3
4 2021-06-01 01:15:00 2.2
5 2021-06-01 01:30:00 12.5
6 2021-06-01 01:45:00 12.4
7 2021-06-01 02:00:00 13.5
8 2021-06-01 02:15:00 12.2
9 2021-06-02 00:15:00 13.7
10 2021-06-02 00:30:00 13.5
11 2021-06-02 00:45:00 23.4
12 2021-06-02 01:00:00 13.3
13 2021-06-02 01:15:00 2.2
14 2021-06-02 01:30:00 12.5
15 2021-06-02 01:45:00 13.4
16 2021-06-02 02:00:00 13.5
17 2021-06-02 02:15:00 13.2
may not be a most effective solution for a large input.. But it works :-)