Search code examples
pythonpandasdataframetimedelta

Fill rows based on start vs end date in Pandas, including start date


I currently have an issue of spanning rows given by a start vs end date range in Pandas, but need to bring the start date as well, not only the difference. I tried this solution link.

Credits to @Shubham Sharma who helped with this and to @Pandinus who asked the question originally.

This works to span the rows of both date differences, I just need to bring the start date on the new column as well, but I have not had so much luck.

Should I continue using this code or would someone propose differently to achieve a good result?

d = df['date_end'].sub(df['date_start']).dt.days
df1 = df.reindex(df.index.repeat(d))
i = df1.groupby(level=0).cumcount() + 1

df1['date'] = df1['date_start'] + pd.to_timedelta(i, unit='d')

Grabbing an example from that post, this is the result I need:

Id  num color   date_start  date_end    Dailydate
0   aa0 blue    1/1/2022    1/2/2022    1/1/2022
0   aa0 blue    1/1/2022    1/2/2022    1/2/2022
1   aa1 red     1/1/2022    1/4/2022    1/1/2022
1   aa1 red     1/1/2022    1/4/2022    1/2/2022
1   aa1 red     1/1/2022    1/4/2022    1/3/2022
1   aa1 red     1/1/2022    1/4/2022    1/4/2022
2   aa2 yellow  1/7/2022    1/9/2022    1/7/2022
2   aa2 yellow  1/7/2022    1/9/2022    1/8/2022
2   aa2 yellow  1/7/2022    1/9/2022    1/9/2022
3   aa3 green   1/12/2022   1/14/2022   1/12/2022
3   aa3 green   1/12/2022   1/14/2022   1/13/2022
3   aa3 green   1/12/2022   1/14/2022   1/14/2022

I have tried playing with cumcount function and even adding the column first, then readding it with a +1 , but it gives me datime issues. And I'm also thinking it's not the best approach.

Any ideas? Thanks.


Solution

  • IIUC, you can use apply on axis=1 to create date ranges, then explode them:

    # input from other post
    data = {
        "id": ["aa0", "aa1", "aa2", "aa3"],
        "number": [1, 2, 2, 1],
        "color": ["blue", "red", "yellow", "green"],
        "date_start": [
            date(2022, 1, 1),
            date(2022, 1, 1),
            date(2022, 1, 7),
            date(2022, 1, 12),
        ],
        "date_end": [
            date(2022, 1, 2),
            date(2022, 1, 4),
            date(2022, 1, 9),
            date(2022, 1, 14),
        ],
    }
    
    df = pd.DataFrame(data)
    
    # solution
    df["Dailydate"] = df.apply(
        lambda row: pd.date_range(row["date_start"], row["date_end"]), axis=1
    )
    df = df.explode("Dailydate").reset_index(names="Id")
    
        Id   id  number   color date_start   date_end  Dailydate
    0    0  aa0       1    blue 2022-01-01 2022-01-02 2022-01-01
    1    0  aa0       1    blue 2022-01-01 2022-01-02 2022-01-02
    2    1  aa1       2     red 2022-01-01 2022-01-04 2022-01-01
    3    1  aa1       2     red 2022-01-01 2022-01-04 2022-01-02
    4    1  aa1       2     red 2022-01-01 2022-01-04 2022-01-03
    5    1  aa1       2     red 2022-01-01 2022-01-04 2022-01-04
    6    2  aa2       2  yellow 2022-01-07 2022-01-09 2022-01-07
    7    2  aa2       2  yellow 2022-01-07 2022-01-09 2022-01-08
    8    2  aa2       2  yellow 2022-01-07 2022-01-09 2022-01-09
    9    3  aa3       1   green 2022-01-12 2022-01-14 2022-01-12
    10   3  aa3       1   green 2022-01-12 2022-01-14 2022-01-13
    11   3  aa3       1   green 2022-01-12 2022-01-14 2022-01-14