Search code examples
pythonpandasdate-range

Pandas: Generate date intervals between two dates with yearly reset


I am trying to generate 8 day intervals between two-time periods using pandas.date_range. In addition, when the 8 day interval exceeds the end of year (i.e., 365/366), I would like the range start to reset to the beginning of respective year. Below is the example code for just two years, however, I do plan to use it across several years, e.g., 2014-01-01 to 2021-01-01.

import pandas as pd

print(pd.date_range(start='2018-12-01', end='2019-01-31', freq='8D'))

Results in,

DatetimeIndex(['2018-12-01', '2018-12-09', '2018-12-17', '2018-12-25','2019-01-02', '2019-01-10', '2019-01-18', '2019-01-26'], dtype='datetime64[ns]', freq='8D')

However, I would like the start of the interval in 2019 to reset to the first day, e.g., 2019-01-01


Solution

  • You could loop creating a date_range up to the start of the next year for each year, appending them until you hit the end date.

    import pandas as pd
    from datetime import date
    
    def date_range_with_resets(start, end, freq):
      start = date.fromisoformat(start)
      end = date.fromisoformat(end)
      result = pd.date_range(start=start, end=start, freq=freq) # initialize result with just start date
      next_year_start = start.replace(year=start.year+1, month=1, day=1)
      while next_year_start < end:
        result = result.append(pd.date_range(start=start, end=next_year_start, freq=freq))
        start = next_year_start
        next_year_start = next_year_start.replace(year=next_year_start.year+1)
      result = result.append(pd.date_range(start=start, end=end, freq=freq))
      return result[1:] # remove duplicate start date
    
    start = '2018-12-01'
    end = '2019-01-31'
    date_range_with_resets(start, end, freq='8D')
    

    Edit: Here's a simpler way without using datetime. Create a date_range of years between start and end, then loop through those.

    def date_range_with_resets(start, end, freq):
      years = pd.date_range(start=start, end=end, freq='YS') # YS=year start
      if len(years) == 0:
        return pd.date_range(start=start, end=end, freq=freq)
      result = pd.date_range(start=start, end=years[0], freq=freq)
      for i in range(0, len(years) - 1):
        result = result.append(pd.date_range(start=years[i], end=years[i+1], freq=freq))
      result = result.append(pd.date_range(start=years[-1], end=end, freq=freq))
      return result