Search code examples
pythonpandasdataframesubtotal

Cannot insert subtotals into pandas dataframe


I'm rather new to Python and to Pandas. With the help of Google and StackOverflow, I've been able to get most of what I'm after. However, this one has me stumped. I have a dataframe that looks like this:

    SalesPerson 1       SalesPerson 2       SalesPerson 3   
    Revenue Number of Orders    Revenue Number of Orders    Revenue Number of Orders
In Process Stage 1  8347    8   9941    5   5105    7
In Process Stage 2  3879    2   3712    3   1350    10
In Process Stage 3  7885    4   6513    8   2218    2
Won Not Invoiced    4369    1   1736    5   4950    9
Won Invoiced    7169    5   5308    3   9832    2
Lost to Competitor  8780    1   3836    7   2851    3
Lost to No Action   2835    5   4653    1   1270    2

I would like to add subtotal rows for In Process, Won, and Lost, so that my data looks like:

    SalesPerson 1       SalesPerson 2       SalesPerson 3   
    Revenue Number of Orders    Revenue Number of Orders    Revenue Number of Orders
In Process Stage 1  8347    8   9941    5   5105    7
In Process Stage 2  3879    2   3712    3   1350    10
In Process Stage 3  7885    4   6513    8   2218    2
In Process Subtotal 20111   14  20166   16  8673    19
Won Not Invoiced    4369    1   1736    5   4950    9
Won Invoiced    7169    5   5308    3   9832    2
Won Subtotal    11538   6   7044    8   14782   11
Won Percent 27% 23% 20% 25% 54% 31%
Lost to Competitor  8780    1   3836    7   2851    3
Lost to No Action   2835    5   4653    1   1270    2
Lost Subtotal   11615   6   8489    8   4121    5
Lost Percent    27% 23% 24% 25% 15% 14%
Total   43264   26  35699   32  27576   35

So far, my code looks like:

    def create_win_lose_table(dataframe):
    in_process_stagename_list = {'In Process Stage 1', 'In Process Stage 2', 'In Process Stage 3'}
    won_stagename_list = {'Won Invoiced', 'Won Not Invoiced'}
    lost_stagename_list = {'Lost to Competitor', 'Lost to No Action'}

    temp_Pipeline_df = dataframe.copy()
    for index, row in temp_Pipeline_df.iterrows():
        if index not in in_process_stagename_list:
            temp_Pipeline_df.drop([index], inplace = True)

    Pipeline_sum = temp_Pipeline_df.sum()
#at the end I was going to concat the sum to the original dataframe, but that's where I'm stuck

I have only started to work on the in process dataframe. My thought was that once I figured that out I could then just duplicate that process for the Won and Lost categories. Any thoughts or approaches are welcome.

Thank you! Jon


Solution

  • Simple Example for you.

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame(np.random.rand(5, 5))
    df_total = pd.DataFrame(np.sum(df.iloc[:, :].values, axis=0)).transpose()
    df_with_totals = df.append(df_total)
    
    df_with_totals
              0         1         2         3         4
    0  0.743746  0.668769  0.894739  0.947641  0.753029
    1  0.236587  0.862352  0.329624  0.637625  0.288876
    2  0.817637  0.250593  0.363517  0.572789  0.785234
    3  0.140941  0.221746  0.673470  0.792831  0.170667
    4  0.965435  0.836577  0.790037  0.996000  0.229857
    0  2.904346  2.840037  3.051388  3.946885  2.227662
    

    You can use the rename argument in Pandas to call the summary row whatever you want.