python pandas dataframe group-by time-series

Calculate mean values of the past x years for every month

I have historical data from 2012 to 2023. I am trying to calculate the average for every month during these years to build a 'reference year' or 'baseline'. My DataFrame (final_df) looks like this:

                Timestamp             bruttoPower [kW]      nettoPower [kW]     
0               2012-01-01 00:00:00   1501.500000           375.375000          
1               2012-01-01 01:00:00   1488.833333           372.208333          
2               2012-01-01 02:00:00   1626.833333           406.708333          
3               2012-01-01 03:00:00   1350.333333           337.583333          
4               2012-01-01 04:00:00   1424.000000           356.000000          
...             ...                   ...                   ...                 
105187          2023-12-31 19:00:00   3301.758391           763.500000          
105188          2023-12-31 20:00:00   2322.399977           592.500000              
105189          2023-12-31 21:00:00   3393.225006           861.000000              
105190          2023-12-31 22:00:00   3784.991643           952.000000              
105191          2023-12-31 23:00:00   3661.591654           950.500000          
105192 rows × 5 columns

I tried doing the same as in this post, since is basically the same problem, but it didn't work: Pandas, how to calculate mean values of the past n years for every month

I also tried asking ChatGPT, but its suggestion didn't work either:

final_df.reset_index(inplace=True)

final_df['Year'] = final_df['Timestamp'].dt.year
final_df['Month'] = final_df['Timestamp'].dt.month

df_new = final_df.groupby(['Year', 'Month']).sum()

df_new.reset_index(inplace=True)

display(df_new)

I am expecting something like this:

(The values are not true, I don't know how much they would be)

    Month   brutto Power [kW]   netto Power [kW]    
0   Jan     1501.500000         375.375000          
1   Feb     1488.833333         372.208333          
2   Mar     1626.833333         406.708333          
3   Apr     1350.333333         337.583333          
4   May     1424.000000         356.000000          
... ...     ...                 ...                 
10  Nov     3393.225006         861.000000          
11  Dec     3784.991643         952.000000

Solution

IIUC you can do (using only data from the question, with real data the final dataframe should have 12 rows):

import calendar

out = (
    df.groupby(df.pop("Timestamp").dt.month)
    .mean()
    .reset_index()
    .rename(columns={"Timestamp": "Month"})
)
out["Month"] = out["Month"].apply(lambda x: calendar.month_abbr[x])

print(out)

Prints:

  Month  bruttoPower [kW]  nettoPower [kW]
0   Jan       1478.300000          369.575
1   Dec       3292.793334          823.900