Search code examples
pythonpandasdataframegroup-bytime-series

Calculate mean values of the past x years for every month


I have historical data from 2012 to 2023. I am trying to calculate the average for every month during these years to build a 'reference year' or 'baseline'. My DataFrame (final_df) looks like this:

                Timestamp             bruttoPower [kW]      nettoPower [kW]     
0               2012-01-01 00:00:00   1501.500000           375.375000          
1               2012-01-01 01:00:00   1488.833333           372.208333          
2               2012-01-01 02:00:00   1626.833333           406.708333          
3               2012-01-01 03:00:00   1350.333333           337.583333          
4               2012-01-01 04:00:00   1424.000000           356.000000          
...             ...                   ...                   ...                 
105187          2023-12-31 19:00:00   3301.758391           763.500000          
105188          2023-12-31 20:00:00   2322.399977           592.500000              
105189          2023-12-31 21:00:00   3393.225006           861.000000              
105190          2023-12-31 22:00:00   3784.991643           952.000000              
105191          2023-12-31 23:00:00   3661.591654           950.500000          
105192 rows × 5 columns

I tried doing the same as in this post, since is basically the same problem, but it didn't work: Pandas, how to calculate mean values of the past n years for every month

I also tried asking ChatGPT, but its suggestion didn't work either:

final_df.reset_index(inplace=True)

final_df['Year'] = final_df['Timestamp'].dt.year
final_df['Month'] = final_df['Timestamp'].dt.month

df_new = final_df.groupby(['Year', 'Month']).sum()

df_new.reset_index(inplace=True)

display(df_new)

I am expecting something like this:

(The values are not true, I don't know how much they would be)

    Month   brutto Power [kW]   netto Power [kW]    
0   Jan     1501.500000         375.375000          
1   Feb     1488.833333         372.208333          
2   Mar     1626.833333         406.708333          
3   Apr     1350.333333         337.583333          
4   May     1424.000000         356.000000          
... ...     ...                 ...                 
10  Nov     3393.225006         861.000000          
11  Dec     3784.991643         952.000000          


Solution

  • IIUC you can do (using only data from the question, with real data the final dataframe should have 12 rows):

    import calendar
    
    out = (
        df.groupby(df.pop("Timestamp").dt.month)
        .mean()
        .reset_index()
        .rename(columns={"Timestamp": "Month"})
    )
    out["Month"] = out["Month"].apply(lambda x: calendar.month_abbr[x])
    
    print(out)
    

    Prints:

      Month  bruttoPower [kW]  nettoPower [kW]
    0   Jan       1478.300000          369.575
    1   Dec       3292.793334          823.900