Search code examples
pandaspandas-resample

Resample 10D but until end of months


I would like to resample a DataFrame with frequences of 10D but cutting the last decade always at the end of the month. ES:

print(df)
            data
index
2010-01-01  145.08
2010-01-02  143.69
2010-01-03  101.06
2010-01-04  57.63
2010-01-05  65.46
...
2010-02-24  48.06
2010-02-25  87.41
2010-02-26  71.97
2010-02-27  73.1
2010-02-28  41.43

Apply something like df.resample('10DM').mean()

           data
index
2010-01-10  97.33
2010-01-20  58.58
2010-01-31  41.43
2010-02-10  35.17
2010-02-20  32.44
2010-02-28  55.44

note that the 1st and 2nd decades are normal 10D resample, but the 3rd can be 8-9-10-11 days based on month and year.

Thanks in advance.


Solution

  • Sample data (easy to check):

    # df = pd.DataFrame({"value": np.arange(1, len(dti)+1)}, index=dti)
    >>> df
                value
    2010-01-01      1
    2010-01-02      2
    2010-01-03      3
    2010-01-04      4
    2010-01-05      5
    ...
    2010-02-24     55
    2010-02-25     56
    2010-02-26     57
    2010-02-27     58
    2010-02-28     59
    

    You need to create groups by (days, month, year):

    grp = df.groupby([pd.cut(df.index.day, [0, 10, 20, 31]),
                      pd.Grouper(freq='M'),
                      pd.Grouper(freq='Y')])
    

    Now you can compute the mean for each group:

    out = grp['value'].apply(lambda x: (x.index.max(), x.mean())).apply(pd.Series) \
                      .reset_index(drop=True).rename(columns={0:'date', 1:'value'}) \
                      .set_index('date').sort_index()
    

    Output result:

    >>> out
                value
    date
    2010-01-10    5.5
    2010-01-20   15.5
    2010-01-31   26.0
    2010-02-10   36.5
    2010-02-20   46.5
    2010-02-28   55.5