Search code examples
pythonpandasdaterow

Create extra rows using date column pandas dataframe


Imagine I have the following data:

ID  Leave Type  Start Date    End Date
1   Sick        2022-01-01    2022-01-01
1   Holiday     2023-03-28  
2   Holiday     2023-01-01    2023-01-02
3   Work        2023-01-01    2023-01-01

I need to find a way to confirm Start Date and End Date have the same value. In case it is not, it needs to count the number of days the End Date is ahead and, for each day, create a row adding 1 day and always matching Start Date and End Date. If End Date is blank, it should create rows until it reaches the day of 2023-03-30. This way resulting on this data:

ID  Leave Type  Start Date    End Date
1   Sick        2022-01-01    2022-01-01
1   Holiday     2023-03-28    2023-03-28
1   Holiday     2023-03-29    2023-03-29
1   Holiday     2023-03-30    2023-03-30
1   Holiday     2023-03-31    2023-03-31
2   Holiday     2023-01-01    2023-01-01
2   Holiday     2023-01-02    2023-01-02
3   Work        2023-01-01    2023-01-01

Thank you!


Solution

  • import pandas as pd
    from pandas.tseries.offsets import MonthEnd
    
    df = pd.DataFrame({'Leave Type': ['Sick', 'Holiday', 'Holiday', 'Work'],
                       'Start Date': ['2022-01-01', '2023-03-28', '2023-01-01', '2023-01-01'],
                       'End Date': ['2022-01-01', '', '2023-01-02', '2023-01-01'],
                       })
    # Converts columns 'Leave Type' and 'Start Date' to datetime
    df[['Start Date', 'End Date']] = \
        df[['Start Date', 'End Date']].apply(pd.to_datetime, errors='coerce')
    # Fill NaT values with the last day of the month
    df['End Date'] = df['End Date'].fillna(df['Start Date'] + MonthEnd(0))
    # Replace 'Start Date' values with list of date ranges
    df['End Date'] = \
        [pd.date_range(s, e, freq='D').tolist() for s,e in zip(df['Start Date'], df['End Date'])]
    # Explode the list
    df = df.explode('End Date')
    
    df['Start Date'] = df['End Date']
    
    print(df)
    

    Result

      Leave Type Start Date   End Date
    0       Sick 2022-01-01 2022-01-01
    1    Holiday 2023-03-28 2023-03-28
    1    Holiday 2023-03-29 2023-03-29
    1    Holiday 2023-03-30 2023-03-30
    1    Holiday 2023-03-31 2023-03-31
    2    Holiday 2023-01-01 2023-01-01
    2    Holiday 2023-01-02 2023-01-02
    3       Work 2023-01-01 2023-01-01