I have historical data from 2012 to 2023. I am trying to calculate the average for every hour in every day over these years to build a 'reference year' or 'baseline'. My DataFrame (final_df) looks like this:
Timestamp bruttoPower [kW] nettoPower [kW]
0 2012-01-01 00:00:00 1501.500000 375.375000
1 2012-01-01 01:00:00 1488.833333 372.208333
2 2012-01-01 02:00:00 1626.833333 406.708333
3 2012-01-01 03:00:00 1350.333333 337.583333
4 2012-01-01 04:00:00 1424.000000 356.000000
... ... ... ...
105187 2023-12-31 19:00:00 3301.758391 763.500000
105188 2023-12-31 20:00:00 2322.399977 592.500000
105189 2023-12-31 21:00:00 3393.225006 861.000000
105190 2023-12-31 22:00:00 3784.991643 952.000000
105191 2023-12-31 23:00:00 3661.591654 950.500000
105192 rows × 5 columns
I had already asked a similar question: Calculate mean values of the past x years for every month
But know I am trying to get the average of each hour on every day over the years (2012-2023).
With help of the answer I got on my last post and ChatGPT I tried this:
import calendar
import pandas as pd
baseline_df = (
final_df.groupby(final_df.pop("Timestamp").dt.date)
.mean()
.reset_index()
.rename(columns={"Timestamp":"Date"})
)
baseline_df["Date"] = baseline_df["Date"].apply(lambda x: x.strftime("%Y-%m-%d"))
display(baseline_df)
which prints:
Date brutto Energie [kWh] netto Energie [kWh]
2012-01-01 3440.645833 3337.426458
2012-01-02 5163.152778 5008.258194
2012-01-03 8374.173611 8122.948403
... ... ...
2023-12-29 6554.490294 6471.125000
2023-12-30 4346.124824 4291.750000
and
import calendar
import pandas as pd
final_df['Date'] = final_df['Timestamp'].dt.date
final_df['Hour'] = final_df['Timestamp'].dt.hour
baseline_df = (final_df.groupby(['Date', 'Hour']).mean().reset_index())
baseline_df["Hour"] = baseline_df["Hour"].apply(lambda x: f"{x:02}")
display(baseline_df)
which prints:
Date Hour Timestamp brutto Energie [kWh] netto Energie [kWh]
2012-01-01 00 2012-01-01 00:00:00 1501.500000 1456.455000
2012-01-01 01 2012-01-01 01:00:00 1488.833333 1444.168333
2012-01-01 02 2012-01-01 02:00:00 1626.833333 1578.028333
... ... ... ... ...
2023-12-31 19 2023-12-31 19:00:00 3301.758391 3252.000000
2023-12-31 20 2023-12-31 20:00:00 2322.399977 2298.000000
but it is not working as I would like. I am trying to get something like this:
Timestamp brutto Power [kW] netto Power [kW]
01-01 00:00:00 1501.500000 375.375000
01-01 01:00:00 1488.833333 372.208333
01-01 02:00:00 1626.833333 406.708333
01-01 03:00:00 1350.333333 1309.823333
... ... ...
31-12 22:00:00 3393.225006 861.000000
31-12 23:00:00 3784.991643 952.000000
IIUC you want to have only one column "Timestamp" with one hour intervals:
t = df.pop("Timestamp")
out = (
df.groupby([t.dt.date, t.dt.hour])
.mean()
.rename_axis(index=["Date", "Hour"])
.reset_index()
)
out["Timestamp"] = pd.to_datetime(
out.pop("Date").astype(str) + " " + out.pop("Hour").astype(str) + ":00"
)
print(out)
Prints:
bruttoPower [kW] nettoPower [kW] Timestamp
0 1501.500000 375.375000 2012-01-01 00:00:00
1 1488.833333 372.208333 2012-01-01 01:00:00
2 1626.833333 406.708333 2012-01-01 02:00:00
3 1350.333333 337.583333 2012-01-01 03:00:00
4 1424.000000 356.000000 2012-01-01 04:00:00
5 3301.758391 763.500000 2023-12-31 19:00:00
6 2322.399977 592.500000 2023-12-31 20:00:00
7 3393.225006 861.000000 2023-12-31 21:00:00
8 3784.991643 952.000000 2023-12-31 22:00:00
9 3661.591654 950.500000 2023-12-31 23:00:00
OR: Group by month/day/hour:
t = df.pop("Timestamp")
out = (
df.groupby([t.dt.month, t.dt.day, t.dt.hour])
.mean()
.rename_axis(index=["Month", "Day", "Hour"])
.reset_index()
)
out["Timestamp"] = (
out.pop("Day").astype(str).str.zfill(2)
+ "-"
+ out.pop("Month").astype(str).str.zfill(2)
+ " "
+ out.pop("Hour").astype(str).str.zfill(2)
+ ":00:00"
)
print(out)
Prints:
bruttoPower [kW] nettoPower [kW] Timestamp
0 1501.500000 375.375000 01-01 00:00:00
1 1488.833333 372.208333 01-01 01:00:00
2 1626.833333 406.708333 01-01 02:00:00
3 1350.333333 337.583333 01-01 03:00:00
4 1424.000000 356.000000 01-01 04:00:00
5 3301.758391 763.500000 31-12 19:00:00
6 2322.399977 592.500000 31-12 20:00:00
7 3393.225006 861.000000 31-12 21:00:00
8 3784.991643 952.000000 31-12 22:00:00
9 3661.591654 950.500000 31-12 23:00:00