Search code examples
pandasdataframemeanseries

make a mean of several year dataframes, hour by hour


I have several dataframes of some value taken very hour, on several year, like this :

df1
Out[6]: 
                     time    P  G(i)  H_sun   T2m  WS10m  Int
0     2005-01-01 00:10:00  0.0   0.0    0.0  0.68   2.11  0.0
1     2005-01-01 01:10:00  0.0   0.0    0.0  0.38   2.11  0.0
2     2005-01-01 02:10:00  0.0   0.0    0.0  0.08   2.11  0.0
3     2005-01-01 03:10:00  0.0   0.0    0.0 -0.22   2.11  0.0
4     2005-01-01 04:10:00  0.0   0.0    0.0  0.06   2.21  0.0
                  ...  ...   ...    ...   ...    ...  ...
8755  2005-12-31 19:10:00  0.0   0.0    0.0  1.75   1.71  0.0
8756  2005-12-31 20:10:00  0.0   0.0    0.0  1.49   1.71  0.0
8757  2005-12-31 21:10:00  0.0   0.0    0.0  1.23   1.70  0.0
8758  2005-12-31 22:10:00  0.0   0.0    0.0  0.95   1.65  0.0
8759  2005-12-31 23:10:00  0.0   0.0    0.0  0.67   1.60  0.0

[8760 rows x 7 columns]

df2
Out[7]: 
                      time    P  G(i)  H_sun   T2m  WS10m  Int
8760   2006-01-01 00:10:00  0.0   0.0    0.0  0.39   1.56  0.0
8761   2006-01-01 01:10:00  0.0   0.0    0.0  0.26   1.52  0.0
8762   2006-01-01 02:10:00  0.0   0.0    0.0  0.13   1.49  0.0
8763   2006-01-01 03:10:00  0.0   0.0    0.0  0.01   1.45  0.0
8764   2006-01-01 04:10:00  0.0   0.0    0.0 -0.45   1.65  0.0
                   ...  ...   ...    ...   ...    ...  ...
17515  2006-12-31 19:10:00  0.0   0.0    0.0  4.24   1.32  0.0
17516  2006-12-31 20:10:00  0.0   0.0    0.0  4.00   1.32  0.0
17517  2006-12-31 21:10:00  0.0   0.0    0.0  3.75   1.32  0.0
17518  2006-12-31 22:10:00  0.0   0.0    0.0  4.34   1.54  0.0
17519  2006-12-31 23:10:00  0.0   0.0    0.0  4.92   1.76  0.0

[8760 rows x 7 columns]

and this for 10 years.

I'm trying to make a mean of the value for the "20XX-01-01 00:10:00" of each year to obtain something like "mean all the value of the 01 January at 00:10". Ideally with a time column merge to obtain just "01-01 00:10:00". Is it possible ?

For now I just know the df.mean() function to take all the value of a column to have just one result, and that's not what I want.


Solution

  • Join all DataFrames together in concat:

    df = pd.concat([df1, df2, df3, ..., df10])
    

    And then aggregate mean with same year - e.g. 2005

    df['time'] = pd.to_datetime(df['time'])
    
    #for remove 29 Feb
    #df = df[((df['time'].dt.month != 2) | (df['time'].dt.day != 29))]
    df1 = df.groupby(pd.to_datetime(df['time'].dt.strftime('2005-%m-%d %H:%M:%S'))).mean()