Search code examples
pythonpandasdataframepandas.excelwriter

Pandas Dataframe Groupby with multiple columns and sum it


I have a dataframe df which i need to groupby multiple column based on a condition.

Input

Employee Name Subjects Billable Hours Date
Anu Java Yes 8 01-03-2021
Anu Python Yes 9 02-03-2021
Anu SQL No 6 03-03-2021
Anu React Yes 5 03-03-2021
Anu .Net No 8 04-03-2021
Bala SQL No 5 01-03-2021
Bala Python Yes 4 01-03-2021
Bala Java Yes 2 02-03-2021
Bala .Net No 8 03-03-2021
Bala React Yes 7 04-03-2021

Columns should be add in output file: Utilization, Billable Utilization, Non-Billable Utilization

  1. Calculation for Utilization columns : Utilization = df[Hours]/9*100

-> If billable column value as "YES"" - create column as Billable utilization

  1. Calculation for Billable Utilization in column: Billable Utilization = df[Hours]/9 * 100

-> If billable column value as "NO"" - create column as Non-Billable utilization

  1. Calculation for Non-Billing Utilization in column: Non-Billable Utilization = df[Hours]/9*100

Code

available =9
utilization= (df['Hours']/available*100).round(2)
df = df.assign(Utilization = utilization)
df1 = df.groupby(['Date','Employee Name'])['Utilization'].sum()
df['Billable'] = np.where(df['Billable'] == 'Yes', 'Billable Utilization','Non Billable Utilization')
df2 = (df.groupby(['Date','Employee Name']).agg({'Hours':sum}).div(available).mul(100)).round(2)

Expected Output Expected Output

My output

My Output


Solution

  • You can "pivot" Hours into Billable and Non-Billable columns using pivot_table() and then convert to utilization:

    # sample data
    df = pd.DataFrame({'Employee Name':{0:'Anu',1:'Anu',2:'Anu',3:'Anu',4:'Anu',5:'Bala',6:'Bala',7:'Bala',8:'Bala',9:'Bala'},'Subjects':{0:'Java',1:'Python',2:'SQL',3:'React',4:'.Net',5:'SQL',6:'Python',7:'Java',8:'.Net',9:'React'},'Billable':{0:'Yes',1:'Yes',2:'No',3:'Yes',4:'No',5:'No',6:'Yes',7:'Yes',8:'No',9:'Yes'},'Hours':{0:8,1:9,2:6,3:5,4:8,5:5,6:4,7:2,8:8,9:7},'Date':{0:'01-03-2021',1:'02-03-2021',2:'03-03-2021',3:'03-03-2021',4:'04-03-2021',5:'01-03-2021',6:'01-03-2021',7:'02-03-2021',8:'03-03-2021',9:'04-03-2021'}})
    
    # pivot hours into (non)billable
    df = df.pivot_table(index=['Date', 'Employee Name'], columns='Billable').reset_index()
    df.columns = ['Date', 'Employee Name', 'Non-Billable', 'Billable']
    
    # compute utilization
    df['Non-Billable'] = df['Non-Billable'].div(9).mul(100).round(2)
    df['Billable'] = df['Billable'].div(9).mul(100).round(2)
    df['Utilization'] = df[['Billable', 'Non-Billable']].fillna(0).sum(axis=1)
    
    #          Date Employee Name  Non-Billable  Billable  Utilization
    # 0  01-03-2021           Anu           NaN     88.89        88.89
    # 1  01-03-2021          Bala         55.56     44.44       100.00
    # 2  02-03-2021           Anu           NaN    100.00       100.00
    # 3  02-03-2021          Bala           NaN     22.22        22.22
    # 4  03-03-2021           Anu         66.67     55.56       122.23
    # 5  03-03-2021          Bala         88.89       NaN        88.89
    # 6  04-03-2021           Anu         88.89       NaN        88.89
    # 7  04-03-2021          Bala           NaN     77.78        77.78