I'm rather new to Python and to Pandas. With the help of Google and StackOverflow, I've been able to get most of what I'm after. However, this one has me stumped. I have a dataframe that looks like this:
SalesPerson 1 SalesPerson 2 SalesPerson 3
Revenue Number of Orders Revenue Number of Orders Revenue Number of Orders
In Process Stage 1 8347 8 9941 5 5105 7
In Process Stage 2 3879 2 3712 3 1350 10
In Process Stage 3 7885 4 6513 8 2218 2
Won Not Invoiced 4369 1 1736 5 4950 9
Won Invoiced 7169 5 5308 3 9832 2
Lost to Competitor 8780 1 3836 7 2851 3
Lost to No Action 2835 5 4653 1 1270 2
I would like to add subtotal rows for In Process, Won, and Lost, so that my data looks like:
SalesPerson 1 SalesPerson 2 SalesPerson 3
Revenue Number of Orders Revenue Number of Orders Revenue Number of Orders
In Process Stage 1 8347 8 9941 5 5105 7
In Process Stage 2 3879 2 3712 3 1350 10
In Process Stage 3 7885 4 6513 8 2218 2
In Process Subtotal 20111 14 20166 16 8673 19
Won Not Invoiced 4369 1 1736 5 4950 9
Won Invoiced 7169 5 5308 3 9832 2
Won Subtotal 11538 6 7044 8 14782 11
Won Percent 27% 23% 20% 25% 54% 31%
Lost to Competitor 8780 1 3836 7 2851 3
Lost to No Action 2835 5 4653 1 1270 2
Lost Subtotal 11615 6 8489 8 4121 5
Lost Percent 27% 23% 24% 25% 15% 14%
Total 43264 26 35699 32 27576 35
So far, my code looks like:
def create_win_lose_table(dataframe):
in_process_stagename_list = {'In Process Stage 1', 'In Process Stage 2', 'In Process Stage 3'}
won_stagename_list = {'Won Invoiced', 'Won Not Invoiced'}
lost_stagename_list = {'Lost to Competitor', 'Lost to No Action'}
temp_Pipeline_df = dataframe.copy()
for index, row in temp_Pipeline_df.iterrows():
if index not in in_process_stagename_list:
temp_Pipeline_df.drop([index], inplace = True)
Pipeline_sum = temp_Pipeline_df.sum()
#at the end I was going to concat the sum to the original dataframe, but that's where I'm stuck
I have only started to work on the in process dataframe. My thought was that once I figured that out I could then just duplicate that process for the Won and Lost categories. Any thoughts or approaches are welcome.
Thank you! Jon
Simple Example for you.
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(5, 5))
df_total = pd.DataFrame(np.sum(df.iloc[:, :].values, axis=0)).transpose()
df_with_totals = df.append(df_total)
df_with_totals
0 1 2 3 4
0 0.743746 0.668769 0.894739 0.947641 0.753029
1 0.236587 0.862352 0.329624 0.637625 0.288876
2 0.817637 0.250593 0.363517 0.572789 0.785234
3 0.140941 0.221746 0.673470 0.792831 0.170667
4 0.965435 0.836577 0.790037 0.996000 0.229857
0 2.904346 2.840037 3.051388 3.946885 2.227662
You can use the rename argument in Pandas to call the summary row whatever you want.