Search code examples
pythonpandasdatetimegroup-byintervals

Groupby one year interval with the start as first datapoint of the series


How can I group a time series with 1 year intervals such that the start of the first interval is the first datapoint and the new series is labeled by that starting point?

E.g. here I have a series that starts at 2000-01-11, so the first interval should have all datapoints between 2000-01-11 and 2001-01-10, second 2001-01-11 and 2002-01-10 etc; the labels of the new series 2000-01-11, 2001-01-11 etc?

import pandas as pd
import numpy as np

i = pd.date_range('2000-01-11', '2022-02-10', freq='D')
t = pd.Series(index=i, data=np.random.randint(0,100,len(i)))
print(t)

t.groupby(pd.Grouper(freq='1Y', origin='start', label='left')).mean()

This codes seems to bin at the start of the year and label by the end of the year.


Solution

  • IIUC, you can use pd.cut and group by these categories:

    x = pd.cut(
        i,
        pd.date_range(start="1999-12-31", end="2022-02-10", freq="12M")
        + pd.offsets.DateOffset(11),
        right=False,
        include_lowest=True
    )
    
    out = t.groupby(x).mean()
    print(out)
    

    Prints:

    [2000-01-11, 2001-01-11)    51.174863
    [2001-01-11, 2002-01-11)    48.197260
    [2002-01-11, 2003-01-11)    49.400000
    [2003-01-11, 2004-01-11)    50.509589
    [2004-01-11, 2005-01-11)    49.680328
    [2005-01-11, 2006-01-11)    48.334247
    [2006-01-11, 2007-01-11)    47.882192
    [2007-01-11, 2008-01-11)    51.405479
    [2008-01-11, 2009-01-11)    50.437158
    [2009-01-11, 2010-01-11)    49.520548
    [2010-01-11, 2011-01-11)    48.591781
    [2011-01-11, 2012-01-11)    51.643836
    [2012-01-11, 2013-01-11)    51.084699
    [2013-01-11, 2014-01-11)    50.334247
    [2014-01-11, 2015-01-11)    51.109589
    [2015-01-11, 2016-01-11)    48.230137
    [2016-01-11, 2017-01-11)    49.691257
    [2017-01-11, 2018-01-11)    47.326027
    [2018-01-11, 2019-01-11)    48.728767
    [2019-01-11, 2020-01-11)    47.947945
    [2020-01-11, 2021-01-11)    48.866120
    [2021-01-11, 2022-01-11)    49.268493
    dtype: float64