I have several dataframes of some value taken very hour, on several year, like this :
df1
Out[6]:
time P G(i) H_sun T2m WS10m Int
0 2005-01-01 00:10:00 0.0 0.0 0.0 0.68 2.11 0.0
1 2005-01-01 01:10:00 0.0 0.0 0.0 0.38 2.11 0.0
2 2005-01-01 02:10:00 0.0 0.0 0.0 0.08 2.11 0.0
3 2005-01-01 03:10:00 0.0 0.0 0.0 -0.22 2.11 0.0
4 2005-01-01 04:10:00 0.0 0.0 0.0 0.06 2.21 0.0
... ... ... ... ... ... ...
8755 2005-12-31 19:10:00 0.0 0.0 0.0 1.75 1.71 0.0
8756 2005-12-31 20:10:00 0.0 0.0 0.0 1.49 1.71 0.0
8757 2005-12-31 21:10:00 0.0 0.0 0.0 1.23 1.70 0.0
8758 2005-12-31 22:10:00 0.0 0.0 0.0 0.95 1.65 0.0
8759 2005-12-31 23:10:00 0.0 0.0 0.0 0.67 1.60 0.0
[8760 rows x 7 columns]
df2
Out[7]:
time P G(i) H_sun T2m WS10m Int
8760 2006-01-01 00:10:00 0.0 0.0 0.0 0.39 1.56 0.0
8761 2006-01-01 01:10:00 0.0 0.0 0.0 0.26 1.52 0.0
8762 2006-01-01 02:10:00 0.0 0.0 0.0 0.13 1.49 0.0
8763 2006-01-01 03:10:00 0.0 0.0 0.0 0.01 1.45 0.0
8764 2006-01-01 04:10:00 0.0 0.0 0.0 -0.45 1.65 0.0
... ... ... ... ... ... ...
17515 2006-12-31 19:10:00 0.0 0.0 0.0 4.24 1.32 0.0
17516 2006-12-31 20:10:00 0.0 0.0 0.0 4.00 1.32 0.0
17517 2006-12-31 21:10:00 0.0 0.0 0.0 3.75 1.32 0.0
17518 2006-12-31 22:10:00 0.0 0.0 0.0 4.34 1.54 0.0
17519 2006-12-31 23:10:00 0.0 0.0 0.0 4.92 1.76 0.0
[8760 rows x 7 columns]
and this for 10 years.
I'm trying to make a mean of the value for the "20XX-01-01 00:10:00" of each year to obtain something like "mean all the value of the 01 January at 00:10". Ideally with a time column merge to obtain just "01-01 00:10:00". Is it possible ?
For now I just know the df.mean() function to take all the value of a column to have just one result, and that's not what I want.
Join all DataFrames together in concat
:
df = pd.concat([df1, df2, df3, ..., df10])
And then aggregate mean with same year - e.g. 2005
df['time'] = pd.to_datetime(df['time'])
#for remove 29 Feb
#df = df[((df['time'].dt.month != 2) | (df['time'].dt.day != 29))]
df1 = df.groupby(pd.to_datetime(df['time'].dt.strftime('2005-%m-%d %H:%M:%S'))).mean()