Search code examples
pandasdatedate-range

How to create pandas.date_range() for each row from column "start_date" and column "end_date"?


I have a df like:

id | start_date | end_date  | price
1  | 2020-10-01 | 2020-10-3 | 1
1  | 2020-10-03 | 2020-10-4 | 1
2  | 2020-10-04 | 2020-10-6 | 2
3  | 2020-10-05 | 2020-10-5 | 3

Columns "start_date" and "end_date" are datetime64[ns].

I want to create a "date" column from the date range.

Easiest way is creating a pandas.date_range(start_date, end_date, freq="D"), then using .explode().

End result should look like:

id | start_date | end_date  | price | date
1  | 2020-10-01 | 2020-10-3 | 1     | 2020-10-01
1  | 2020-10-01 | 2020-10-3 | 1     | 2020-10-02
1  | 2020-10-01 | 2020-10-3 | 1     | 2020-10-03
1  | 2020-10-03 | 2020-10-4 | 1     | 2020-10-03
1  | 2020-10-03 | 2020-10-4 | 1     | 2020-10-04
2  | 2020-10-04 | 2020-10-6 | 2     | 2020-10-04
2  | 2020-10-04 | 2020-10-6 | 2     | 2020-10-05
2  | 2020-10-04 | 2020-10-6 | 2     | 2020-10-06
3  | 2020-10-05 | 2020-10-5 | 3     | 2020-10-05

Tried so far:

df["daterange"] = pd.date_range(df["start_date"], df["end_date"])

TypeError: Cannot convert input [0 2020-10-01 1 2020-10-01

for row in df.itertuples():
    df["daterange"] = pd.date_range(start=row.start_date, end=row.end_date)

ValueError: Length of values (3) does not match length of index (9)

Lambdas, apply, melt, etc. is way too slow for my dataframe size and is not useable!

/edit

Fastet method I found so far:

https://github.com/Garve/scikit-bonus
skbonus.pandas.preprocessing.DateTimeExploder(
        "date",
        start_column="start_date",
        end_column="end_date",
        frequency="d",
        drop=False,
    )

Solution

  • Fasted method I've found so far:

    https://github.com/Garve/scikit-bonus
    from skbonus.pandas.preprocessing import DateTimeExploder
    df = DateTimeExploder(
            "date",
            start_column="start_date",
            end_column="end_date",
            frequency="d",
            drop=False,
        )