How can I group a time series with 1 year intervals such that the start of the first interval is the first datapoint and the new series is labeled by that starting point?
E.g. here I have a series that starts at 2000-01-11
, so the first interval should have all datapoints between 2000-01-11
and 2001-01-10
, second 2001-01-11
and 2002-01-10
etc; the labels of the new series 2000-01-11, 2001-01-11 etc?
import pandas as pd
import numpy as np
i = pd.date_range('2000-01-11', '2022-02-10', freq='D')
t = pd.Series(index=i, data=np.random.randint(0,100,len(i)))
print(t)
t.groupby(pd.Grouper(freq='1Y', origin='start', label='left')).mean()
This codes seems to bin at the start of the year and label by the end of the year.
IIUC, you can use pd.cut
and group by these categories:
x = pd.cut(
i,
pd.date_range(start="1999-12-31", end="2022-02-10", freq="12M")
+ pd.offsets.DateOffset(11),
right=False,
include_lowest=True
)
out = t.groupby(x).mean()
print(out)
Prints:
[2000-01-11, 2001-01-11) 51.174863
[2001-01-11, 2002-01-11) 48.197260
[2002-01-11, 2003-01-11) 49.400000
[2003-01-11, 2004-01-11) 50.509589
[2004-01-11, 2005-01-11) 49.680328
[2005-01-11, 2006-01-11) 48.334247
[2006-01-11, 2007-01-11) 47.882192
[2007-01-11, 2008-01-11) 51.405479
[2008-01-11, 2009-01-11) 50.437158
[2009-01-11, 2010-01-11) 49.520548
[2010-01-11, 2011-01-11) 48.591781
[2011-01-11, 2012-01-11) 51.643836
[2012-01-11, 2013-01-11) 51.084699
[2013-01-11, 2014-01-11) 50.334247
[2014-01-11, 2015-01-11) 51.109589
[2015-01-11, 2016-01-11) 48.230137
[2016-01-11, 2017-01-11) 49.691257
[2017-01-11, 2018-01-11) 47.326027
[2018-01-11, 2019-01-11) 48.728767
[2019-01-11, 2020-01-11) 47.947945
[2020-01-11, 2021-01-11) 48.866120
[2021-01-11, 2022-01-11) 49.268493
dtype: float64