Let's say I have the starting input table below.
Name | _leavestartdate | _leaveenddate |
---|---|---|
Jerry Rice | 2023-12-26 | 2023-12-26 |
Pablo Picasso | 2023-08-15 | 2023-08-21 |
Ray Ramon | 2023-03-16 | 2023-03-17 |
The Python code to create it is:
import pandas as pd
df = pd.DataFrame({'name':["Jerry Rice","Pablo Picasso","Ray Ramon"],'_leavestartdate':["2023-12-26","2023-08-15","2023-03-16"],'_leaveenddate':["2023-12-26","2023-08-21","2023-03-17"]})
First, I want to get all dates between the _leavestartdate and _leaveenddate split based on Name column. The list of dates would populate under new column "Date". I also want to exclude any weekend dates in the newly reformatted table.
So my desired output table is shown below. Note that it only shows weekday dates and excludes the 2 dates (8/19/2023 and 8/20/2023) from Pablo Picasso that are weekend dates.
Name | Date | _leavestartdate | _leaveenddate |
---|---|---|---|
Jerry Rice | 2023-12-26 | 2023-12-26 | 2023-12-26 |
Pablo Picasso | 2023-08-15 | 2023-08-15 | 2023-08-21 |
Pablo Picasso | 2023-08-16 | 2023-08-15 | 2023-08-21 |
Pablo Picasso | 2023-08-17 | 2023-08-15 | 2023-08-21 |
Pablo Picasso | 2023-08-18 | 2023-08-15 | 2023-08-21 |
Pablo Picasso | 2023-08-21 | 2023-08-15 | 2023-08-21 |
Ray Ramon | 2023-03-16 | 2023-03-16 | 2023-03-17 |
Ray Ramon | 2023-03-17 | 2023-03-16 | 2023-03-17 |
I looked at previous solutions in StackOverflow and tried the following code:
date = pd.DataFrame(pd.date_range(start=df.min()._leavestartdate,
end=df.max()._leaveenddate), columns=['Date'])
df_final = pd.merge(left=date, right=df, left_on='Date', right_on='_leavestartdate',
how='outer').fillna(method='ffill')
But the solution produces the following wrong table because it's not accounting for the "Name" column:
How can I fix the code to get my desired output table?
import pandas as pd
df['_leavestartdate'] = pd.to_datetime(df['_leavestartdate'])
df['_leaveenddate'] = pd.to_datetime(df['_leaveenddate'])
df_final = pd.concat([(pd.DataFrame({'Date':
pd.date_range(df.loc[i, '_leavestartdate'],
df.loc[i, '_leaveenddate']),
'name': df.loc[i, 'name']}))
for i in range(len(df))])
df_final = df_final[df_final['Date'].dt.dayofweek < 5]
print(df_final.merge(df, how='left', on='name'))
Here, in the list comprehension, the necessary rows of dataframes are generated and concatenate. Then the lines with Saturday and Sunday were removed and a left-hand merge was made by name.
You can even do without a merge and write _leavestartdate, _leaveenddate in df_final:
df_final = pd.concat([(pd.DataFrame({'Date':
pd.date_range(df.loc[i, '_leavestartdate'],
df.loc[i, '_leaveenddate']),
'name': df.loc[i, 'name'],
'_leavestartdate': df.loc[i, '_leavestartdate'],
'_leaveenddate': df.loc[i, '_leaveenddate']}))
for i in range(len(df))])
df_final = df_final[df_final['Date'].dt.dayofweek < 5]
Output
Date name _leavestartdate _leaveenddate
0 2023-12-26 Jerry Rice 2023-12-26 2023-12-26
1 2023-08-15 Pablo Picasso 2023-08-15 2023-08-21
2 2023-08-16 Pablo Picasso 2023-08-15 2023-08-21
3 2023-08-17 Pablo Picasso 2023-08-15 2023-08-21
4 2023-08-18 Pablo Picasso 2023-08-15 2023-08-21
5 2023-08-21 Pablo Picasso 2023-08-15 2023-08-21
6 2023-03-16 Ray Ramon 2023-03-16 2023-03-17
7 2023-03-17 Ray Ramon 2023-03-16 2023-03-17
If I didn’t take into account anything, please write.