Search code examples
pythonpandasdate-range

Why does pandas `date_range` rounds up to the next month?


When using pandas.date_range with start date, frequency, and periods the date range rounds up when using the start date as the last day of a month.

It seems like a silent edge case bug. If it's not a bug, any idea why it does that?

For example

import pandas as pd

start_date = pd.Timestamp(2023, 5, 31)
date_range = pd.date_range(start=start_date, freq="MS", periods=6)

results in

DatetimeIndex(['2023-06-01', '2023-07-01', '2023-08-01', '2023-09-01',
               '2023-10-01', '2023-11-01'],
              dtype='datetime64[ns]', freq='MS')

From the documentation, I'd expect it to start in May and end in October:

DatetimeIndex(['2023-05-01', '2023-06-01', '2023-07-01', '2023-08-01', '2023-09-01',
               '2023-10-01'],
              dtype='datetime64[ns]', freq='MS')

I thought it had to do with the inclusive argument but that's not the reason either.


Solution

  • pd.date_range is to generate a range of date between start and end. 2023-05-01 is less than start date 2023-05-31, it will never reach it. To do what you want, you can replace the day of pd.Timestamp by 1.

    start_date = pd.Timestamp(2023, 5, 31)
    date_range = pd.date_range(start=start_date.replace(day=1), freq="MS", periods=6)
    
    print(date_range)
    
    DatetimeIndex(['2023-05-01', '2023-06-01', '2023-07-01', '2023-08-01',
                   '2023-09-01', '2023-10-01'],
                  dtype='datetime64[ns]', freq='MS')