I have historical data from 2012 to 2023. I am trying to calculate the average for every month during these years to build a 'reference year' or 'baseline'. My DataFrame (final_df) looks like this:
Timestamp bruttoPower [kW] nettoPower [kW]
0 2012-01-01 00:00:00 1501.500000 375.375000
1 2012-01-01 01:00:00 1488.833333 372.208333
2 2012-01-01 02:00:00 1626.833333 406.708333
3 2012-01-01 03:00:00 1350.333333 337.583333
4 2012-01-01 04:00:00 1424.000000 356.000000
... ... ... ...
105187 2023-12-31 19:00:00 3301.758391 763.500000
105188 2023-12-31 20:00:00 2322.399977 592.500000
105189 2023-12-31 21:00:00 3393.225006 861.000000
105190 2023-12-31 22:00:00 3784.991643 952.000000
105191 2023-12-31 23:00:00 3661.591654 950.500000
105192 rows × 5 columns
I tried doing the same as in this post, since is basically the same problem, but it didn't work: Pandas, how to calculate mean values of the past n years for every month
I also tried asking ChatGPT, but its suggestion didn't work either:
final_df.reset_index(inplace=True)
final_df['Year'] = final_df['Timestamp'].dt.year
final_df['Month'] = final_df['Timestamp'].dt.month
df_new = final_df.groupby(['Year', 'Month']).sum()
df_new.reset_index(inplace=True)
display(df_new)
I am expecting something like this:
(The values are not true, I don't know how much they would be)
Month brutto Power [kW] netto Power [kW]
0 Jan 1501.500000 375.375000
1 Feb 1488.833333 372.208333
2 Mar 1626.833333 406.708333
3 Apr 1350.333333 337.583333
4 May 1424.000000 356.000000
... ... ... ...
10 Nov 3393.225006 861.000000
11 Dec 3784.991643 952.000000
IIUC you can do (using only data from the question, with real data the final dataframe should have 12 rows):
import calendar
out = (
df.groupby(df.pop("Timestamp").dt.month)
.mean()
.reset_index()
.rename(columns={"Timestamp": "Month"})
)
out["Month"] = out["Month"].apply(lambda x: calendar.month_abbr[x])
print(out)
Prints:
Month bruttoPower [kW] nettoPower [kW]
0 Jan 1478.300000 369.575
1 Dec 3292.793334 823.900