Search code examples
pythonpandasdatetimeoffsetdatetimeoffset

Why does DateOffset.rollback() not work the way I expect it to with days / hours?


I'm trying to move an arbitrary input pd.Timestamp back to the correct start offset unless it is already on offset.

The following code works with BusinessMonthEnd, MonthEnd, MonthStart, and so on:

import pandas as pd
from pandas.tseries.offsets import *


def to_start(t: pd.Timestamp, freq: pd.DateOffset) -> pd.Timestamp:
    return freq.rollback(t)


assert to_start(pd.to_datetime("2021-09-27"), MonthEnd()) == pd.to_datetime("2021-08-31")
assert to_start(pd.to_datetime("2021-09-27"), MonthBegin()) == pd.to_datetime("2021-08-31")
assert to_start(pd.to_datetime("2021-08-27"), BMonthEnd()) == pd.to_datetime("2021-07-30")
assert to_start(pd.to_datetime("2021-08-27"), YearBegin()) == pd.to_datetime("2021-01-01")
assert to_start(pd.to_datetime("2021-08-27"), BYearBegin()) == pd.to_datetime("2021-01-01")

# also it works nicely with holiday calendars
from pandas.tseries.holiday import USFederalHolidayCalendar

us_fed_biz_days = CustomBusinessDay(calendar=USFederalHolidayCalendar())
memorial_day = pd.to_datetime("2021-05-31")
the_friday_before_memorial_day = pd.to_datetime("2021-05-28")
assert to_start(memorial_day, us_fed_biz_days) == the_friday_before_memorial_day

However (and this is driving me crazy) it doesn't seem to work for Day, BusinessDay, Week, Hour etc:

assert to_start(pd.to_datetime("2021-08-27 05:00"), Day()) == pd.to_datetime("2021-08-27")
assert to_start(pd.to_datetime("2021-08-27 05:00"), BDay()) == pd.to_datetime("2021-08-27")
assert to_start(pd.to_datetime("2021-08-27 05:15"), Hour()) == pd.to_datetime("2021-08-27 05:00")
assert to_start(pd.to_datetime("2021-08-26"), pd.tseries.frequencies.to_offset("W-MON")) == pd.to_datetime("2021-08-24")

I have also tried this:

def to_start(t: pd.Timestamp, freq: pd.DateOffset) -> pd.Timestamp:
        return pd.Period(t, freq=freq).start_time

which ironically works with the second set of assertions but not the first?

Is my expectation of the above assertions not failing unreasonable, and if so what am I missing, please?


Solution

  • From your examples, I take it that you want to use fixed frequencies (e.g. an hour) to floor the given date/time while variable frequencies (e.g. business month end) should call .rollback if the check given in .is_on_offset returns True (see source code linked in the questions' comment section).

    Ex:

    def to_start(t: pd.Timestamp, freq: pd.DateOffset) -> pd.Timestamp:
        try:
            return t.floor(freq) # fixed frequencies should just floor the date/time
        except ValueError: # if freq is variable, we fall into here...
            return freq.rollback(t.floor("D"))
    

    Tests:

    # variable offsets that depend on the date
    assert to_start(pd.to_datetime("2021-09-27"), pd.tseries.offsets.MonthEnd()) == pd.to_datetime("2021-08-31")
    assert to_start(pd.to_datetime("2021-09-27"), pd.tseries.offsets.MonthBegin()) == pd.to_datetime("2021-09-01")
    assert to_start(pd.to_datetime("2021-08-27"), pd.tseries.offsets.BMonthEnd()) == pd.to_datetime("2021-07-30")
    assert to_start(pd.to_datetime("2021-08-27"), pd.tseries.offsets.YearBegin()) == pd.to_datetime("2021-01-01")
    assert to_start(pd.to_datetime("2021-08-27"), pd.tseries.offsets.BYearBegin()) == pd.to_datetime("2021-01-01")
    assert to_start(pd.to_datetime("2021-08-26"), pd.tseries.frequencies.to_offset("W-MON")) == pd.to_datetime("2021-08-23")
    assert to_start(pd.to_datetime("2021-08-28 05:00"), pd.tseries.offsets.BDay()) == pd.to_datetime("2021-08-27")
    # fixed offsets (e.g. an hour is always an hour)
    assert to_start(pd.to_datetime("2021-08-27 05:00"), pd.tseries.offsets.Day()) == pd.to_datetime("2021-08-27")
    assert to_start(pd.to_datetime("2021-08-27 05:15"), pd.tseries.offsets.Hour()) == pd.to_datetime("2021-08-27 05:00")