Search code examples
pandasgroup-bytime-seriespandas-resample

Averaging the data across two calendar years and defining the beginning month


I have a data for a period from December 2013 to November 2018. I converted it into a data frame as shown here.

    Date    0.1 0.2 0.3 0.4 0.5 0.6 
2013-12-01  301.04  297.4   296.63  295.76  295.25  295.25
2013-12-04  297.96  297.15  296.25  295.25  294.43  293.45
2013-12-05  298.4   297.61  296.65  295.81  294.75  293.89
2013-12-08  298.82  297.95  297.15  296.25  295.45  294.41
2013-12-09  298.65  297.65  296.95  296.02  295.13  294.05
2013-12-12  299.05  297.33  296.65  295.81  294.85  293.85
2013-12-16  301.05  300.28  299.38  298.45  297.65  296.51
....
2014-01-10  301.65  297.45  296.46  295.52  294.65  293.56  
2014-01-11  301.99  298.95  298.39  297.15  296.05  295.11  
2014-01-12  299.86  298.65  297.73  296.82  296.35  295.37  
2014-01-13  299.25  298.15  297.3   296.43  295.26  294.31  

I want to take monthly mean and seasonal mean of this data.

For monthly mean I have tried

df.resample('M').mean()

And it worked well.

For seasons, I would like decompose this data into 4 seasons (December-Feb; Mar-May; June-Aug; and Sep-Nov) of three months interval. While I tried the resample with 3 months interval. i.e.

df.resample('3M').mean()

However this is not worked well as it giving the average for the starting December month separately and then considering the above said interval for a calendar year (ie. from January to March and so on).

I would like to know if there are any possible ways to avoid this by specifying which month is our period of consideration begins.

Moreover, I would also like to know whether we can define these seasons beforehand and group the data accordingly to get averages with more ease.


Solution

  • You can define the origin in resample:

    df.resample('M', origin=pd.Timestamp('2013-12-01')).mean()