Imagine I have the following data:
ID Leave Type Start Date End Date
1 Sick 2022-01-01 2022-01-01
1 Holiday 2023-03-28
2 Holiday 2023-01-01 2023-01-02
3 Work 2023-01-01 2023-01-01
I need to find a way to confirm Start Date and End Date have the same value. In case it is not, it needs to count the number of days the End Date is ahead and, for each day, create a row adding 1 day and always matching Start Date and End Date. If End Date is blank, it should create rows until it reaches the day of 2023-03-30. This way resulting on this data:
ID Leave Type Start Date End Date
1 Sick 2022-01-01 2022-01-01
1 Holiday 2023-03-28 2023-03-28
1 Holiday 2023-03-29 2023-03-29
1 Holiday 2023-03-30 2023-03-30
1 Holiday 2023-03-31 2023-03-31
2 Holiday 2023-01-01 2023-01-01
2 Holiday 2023-01-02 2023-01-02
3 Work 2023-01-01 2023-01-01
Thank you!
import pandas as pd
from pandas.tseries.offsets import MonthEnd
df = pd.DataFrame({'Leave Type': ['Sick', 'Holiday', 'Holiday', 'Work'],
'Start Date': ['2022-01-01', '2023-03-28', '2023-01-01', '2023-01-01'],
'End Date': ['2022-01-01', '', '2023-01-02', '2023-01-01'],
})
# Converts columns 'Leave Type' and 'Start Date' to datetime
df[['Start Date', 'End Date']] = \
df[['Start Date', 'End Date']].apply(pd.to_datetime, errors='coerce')
# Fill NaT values with the last day of the month
df['End Date'] = df['End Date'].fillna(df['Start Date'] + MonthEnd(0))
# Replace 'Start Date' values with list of date ranges
df['End Date'] = \
[pd.date_range(s, e, freq='D').tolist() for s,e in zip(df['Start Date'], df['End Date'])]
# Explode the list
df = df.explode('End Date')
df['Start Date'] = df['End Date']
print(df)
Result
Leave Type Start Date End Date
0 Sick 2022-01-01 2022-01-01
1 Holiday 2023-03-28 2023-03-28
1 Holiday 2023-03-29 2023-03-29
1 Holiday 2023-03-30 2023-03-30
1 Holiday 2023-03-31 2023-03-31
2 Holiday 2023-01-01 2023-01-01
2 Holiday 2023-01-02 2023-01-02
3 Work 2023-01-01 2023-01-01