Search code examples
pythonpandasdataframedategroup-by

Dataframe: Using "Name" Column as Segmentation, Get all dates between Startdate and Enddate columns While Excluding Weekend Dates


Let's say I have the starting input table below.

Name _leavestartdate _leaveenddate
Jerry Rice 2023-12-26 2023-12-26
Pablo Picasso 2023-08-15 2023-08-21
Ray Ramon 2023-03-16 2023-03-17

The Python code to create it is:

import pandas as pd
df = pd.DataFrame({'name':["Jerry Rice","Pablo Picasso","Ray Ramon"],'_leavestartdate':["2023-12-26","2023-08-15","2023-03-16"],'_leaveenddate':["2023-12-26","2023-08-21","2023-03-17"]})

First, I want to get all dates between the _leavestartdate and _leaveenddate split based on Name column. The list of dates would populate under new column "Date". I also want to exclude any weekend dates in the newly reformatted table.

So my desired output table is shown below. Note that it only shows weekday dates and excludes the 2 dates (8/19/2023 and 8/20/2023) from Pablo Picasso that are weekend dates.

Name Date _leavestartdate _leaveenddate
Jerry Rice 2023-12-26 2023-12-26 2023-12-26
Pablo Picasso 2023-08-15 2023-08-15 2023-08-21
Pablo Picasso 2023-08-16 2023-08-15 2023-08-21
Pablo Picasso 2023-08-17 2023-08-15 2023-08-21
Pablo Picasso 2023-08-18 2023-08-15 2023-08-21
Pablo Picasso 2023-08-21 2023-08-15 2023-08-21
Ray Ramon 2023-03-16 2023-03-16 2023-03-17
Ray Ramon 2023-03-17 2023-03-16 2023-03-17

I looked at previous solutions in StackOverflow and tried the following code:

date = pd.DataFrame(pd.date_range(start=df.min()._leavestartdate, 
                     end=df.max()._leaveenddate), columns=['Date'])

df_final = pd.merge(left=date, right=df, left_on='Date', right_on='_leavestartdate', 
         how='outer').fillna(method='ffill')

But the solution produces the following wrong table because it's not accounting for the "Name" column:

enter image description here

How can I fix the code to get my desired output table?


Solution

  • import pandas as pd
    
    df['_leavestartdate'] = pd.to_datetime(df['_leavestartdate'])
    df['_leaveenddate'] = pd.to_datetime(df['_leaveenddate'])
    
    
    df_final = pd.concat([(pd.DataFrame({'Date':
                                 pd.date_range(df.loc[i, '_leavestartdate'],
                                               df.loc[i, '_leaveenddate']),
                                         'name': df.loc[i, 'name']}))
                          for i in range(len(df))])
    
    df_final = df_final[df_final['Date'].dt.dayofweek < 5]
    
    
    print(df_final.merge(df, how='left', on='name'))
    

    Here, in the list comprehension, the necessary rows of dataframes are generated and concatenate. Then the lines with Saturday and Sunday were removed and a left-hand merge was made by name.

    You can even do without a merge and write _leavestartdate, _leaveenddate in df_final:

    df_final = pd.concat([(pd.DataFrame({'Date':
                                             pd.date_range(df.loc[i, '_leavestartdate'],
                                                           df.loc[i, '_leaveenddate']),
                                         'name': df.loc[i, 'name'],
                                         '_leavestartdate': df.loc[i, '_leavestartdate'],
                                         '_leaveenddate': df.loc[i, '_leaveenddate']}))
                          for i in range(len(df))])
    
    df_final = df_final[df_final['Date'].dt.dayofweek < 5]
    

    Output

            Date           name _leavestartdate _leaveenddate
    0 2023-12-26     Jerry Rice      2023-12-26    2023-12-26
    1 2023-08-15  Pablo Picasso      2023-08-15    2023-08-21
    2 2023-08-16  Pablo Picasso      2023-08-15    2023-08-21
    3 2023-08-17  Pablo Picasso      2023-08-15    2023-08-21
    4 2023-08-18  Pablo Picasso      2023-08-15    2023-08-21
    5 2023-08-21  Pablo Picasso      2023-08-15    2023-08-21
    6 2023-03-16      Ray Ramon      2023-03-16    2023-03-17
    7 2023-03-17      Ray Ramon      2023-03-16    2023-03-17
    

    If I didn’t take into account anything, please write.