When using pandas.date_range
with start date, frequency, and periods the date range rounds up when using the start date as the last day of a month.
It seems like a silent edge case bug. If it's not a bug, any idea why it does that?
For example
import pandas as pd
start_date = pd.Timestamp(2023, 5, 31)
date_range = pd.date_range(start=start_date, freq="MS", periods=6)
results in
DatetimeIndex(['2023-06-01', '2023-07-01', '2023-08-01', '2023-09-01',
'2023-10-01', '2023-11-01'],
dtype='datetime64[ns]', freq='MS')
From the documentation, I'd expect it to start in May and end in October:
DatetimeIndex(['2023-05-01', '2023-06-01', '2023-07-01', '2023-08-01', '2023-09-01',
'2023-10-01'],
dtype='datetime64[ns]', freq='MS')
I thought it had to do with the inclusive
argument but that's not the reason either.
pd.date_range
is to generate a range of date between start
and end
. 2023-05-01
is less than start date 2023-05-31
, it will never reach it. To do what you want, you can replace the day of pd.Timestamp
by 1.
start_date = pd.Timestamp(2023, 5, 31)
date_range = pd.date_range(start=start_date.replace(day=1), freq="MS", periods=6)
print(date_range)
DatetimeIndex(['2023-05-01', '2023-06-01', '2023-07-01', '2023-08-01',
'2023-09-01', '2023-10-01'],
dtype='datetime64[ns]', freq='MS')