I have the following sample data set :
with this code I managed to load it in a pandas dataframe and do basic manipulation like having the total number of days per profile and total cost.
import pandas as pd
df = pd.read_excel('multiindex.xlsx', "0001",header=[0,1,2,3],index_col=[0,1,2])
print(df)
print(df.axes[1])
print(df.axes[0])
df['Total days'] = df.groupby(axis = 1,level = 'Full project').sum()
df['Total cost']=(df.index.get_level_values('Daily rate')*df['Total days']).values
print (df)
Now I want to add a "Total cost" for each phase, I can create the aggregated df with the following code
df2 = df.groupby(axis = 1,level = 'Phase').sum()
print( df2 )
Phase Build Discovery
Profile Name Daily rate
Dev dev1 10 110.0 4.0 6.0
dev2 10 88.0 4.0 4.0
dev3 10 88.0 4.0 4.0
QA qa1 10 110.0 4.0 6.0
qa2 10 88.0 4.0 4.0
But how can I multiply "Daily rates" by each sub-columns in Phase and add a column named "[phase name] total cost" without having to specify each phase name, and get a result like this :
Use DataFrame.mul
for multiple all columns, then use DataFrame.add_prefix
and append to original DataFrame df2
:
df2 = df.sum(axis = 1,level = 'Phase')
df22 = df2.mul(df2.index.get_level_values('Daily rate'), axis=0).add_prefix(' total cost')
df2 = df2.join(df22)