Search code examples
pythonpandassortingpivot-tablereindex

Sorting in a Pandas pivot_table


I have been looking all over trying to figure out how to sort my pivot table correctly and I haven't had any luck.

    client          unit    task                hours   month
0   A               DVADA   Account Management  6.50    January     
1   A               DVADA   Buying              1.25    January 
2   A               DVADA   Meeting / Call      0.50    January 
3   A               DVADA   Account Management  3.00    January 
4   A               DVADA   Billing             2.50    February    
5   A               DVADA   Account Management  6.50    February        
6   A               DVADA   Buying              1.25    February    
7   A               DVADA   Meeting / Call      0.50    February    
8   A               DVADA   Account Management  3.00    February    
9   A               DVADA   Billing             2.50    February
10  A               DVADA   Billing             2.50    December    
11  A               DVADA   Account Management  6.50    December        
12  A               DVADA   Buying              1.25    December    
13  A               DVADA   Meeting / Call      0.50    December    
14  A               DVADA   Account Management  3.00    December    
15  A               DVADA   Billing             2.50    December
16  A               DVADA   Account Management  6.50    August      
17  A               DVADA   Buying              1.25    August  
18  A               DVADA   Meeting / Call      0.50    August  
19  A               DVADA   Account Management  3.00    August
20  A               DVADA   Account Management  6.50    April       
21  A               DVADA   Buying              1.25    April   
22  A               DVADA   Meeting / Call      0.50    April   
23  A               DVADA   Account Management  3.00    April
24  B               DVADA   Account Management  6.50    January     
25  B               DVADA   Buying              1.25    January 
26  B               DVADA   Meeting / Call      0.50    January 
27  B               DVADA   Account Management  3.00    January 
28  B               DVADA   Billing             2.50    February    
29  B               DVADA   Account Management  6.50    February        
30  B               DVADA   Buying              1.25    February    
31  B               DVADA   Meeting / Call      0.50    February    
32  B               DVADA   Account Management  3.00    February    
33  B               DVADA   Billing             2.50    February
34  B               DVADA   Billing             2.50    December    
35  B               DVADA   Account Management  6.50    December        
36  B               DVADA   Buying              1.25    December    
37  B               DVADA   Meeting / Call      0.50    December    
38  B               DVADA   Account Management  3.00    December    
39  B               DVADA   Billing             2.50    December
40  B               DVADA   Account Management  6.50    August      
41  B               DVADA   Buying              1.25    August  
42  B               DVADA   Meeting / Call      0.50    August  
43  B               DVADA   Account Management  3.00    August
44  B               DVADA   Account Management  6.50    April       
45  B               DVADA   Buying              1.25    April   
46  B               DVADA   Meeting / Call      0.50    April   
47  C               DVADA   Account Management  3.00    April
48  C               DVADA   Account Management  6.50    January     
49  C               DVADA   Buying              1.25    January 
50  C               DVADA   Meeting / Call      0.50    January 
51  C               DVADA   Account Management  3.00    January 
52  C               DVADA   Billing             2.50    February    
53  C               DVADA   Account Management  6.50    February        
54  C               DVADA   Buying              1.25    February    
55  C               DVADA   Meeting / Call      0.50    February    
56  C               DVADA   Account Management  3.00    February    
57  C               DVADA   Billing             2.50    February
58  C               DVADA   Billing             2.50    December    
59  C               DVADA   Account Management  6.50    December        
60  C               DVADA   Buying              1.25    December    
61  C               DVADA   Meeting / Call      0.50    December    
62  C               DVADA   Account Management  3.00    December    
63  C               DVADA   Billing             2.50    December
64  C               DVADA   Account Management  6.50    August      
65  C               DVADA   Buying              1.25    August  
66  C               DVADA   Meeting / Call      0.50    August  
67  C               DVADA   Account Management  3.00    August
68  C               DVADA   Account Management  6.50    April       
69  C               DVADA   Buying              1.25    April   
70  C               DVADA   Meeting / Call      0.50    April   
71  C               DVADA   Account Management  3.00    April

df = pd.pivot_table(vp_clients, values='hours', index=['client', 'month'], aggfunc=sum)

Which returns a pivot table with three columns (client, month, hours). Each client has 12 months (Jan-Dec) and each of those months has a hours for that month.

                        hours
client          month

A               April   203.50
                August  227.75
                December 159.75
                February 203.25
                January 199.25

B               April   203.50
                August  227.75
                December 159.75
                February 203.25
                January 199.25

C               April   203.50
                August  227.75
                December 159.75
                February 203.25
                January 199.25

I want to sort this pivot table by the months but keep the client column in tacked.

                           hours
client           month

A               January 203.50
                February 227.75
                March    159.75
                April    203.25
                May     199.90

B               January 203.50
                February 227.75
                March    159.75
                April    203.25
                May     199.90

C               January 203.50
                February 227.75
                March    159.75
                April    203.25
                May     199.90

The sorting issue is fixed with the answer below from Scott. Now I want to add a row to each client with the total hours used.

                           hours
client           month

A               January    203.50
                February   227.75
                March      159.75
                April      203.25
                May        199.90
                Total     1000.34

B               January    203.50
                February   227.75
                March      159.75
                April      203.25
                May       199.90
                Total     1000.34

C               January   203.50
                February   227.75
                March      159.75
                April      203.25
                May       199.90
                Total     1000.34

Any help will be greatly appreciated


Solution

  • Update to add Total at end of each client

    vp_clients['month'] = pd.Categorical(vp_clients['month'], 
                                         ordered=True, 
                                         categories=['January','February','March',
                                                     'April','May','June','July',
                                                     'August','September','October',
                                                     'November','December','Total'])
    
    df = pd.pivot_table(vp_clients, values='hours', index=['client', 'month'], aggfunc=sum)
    
    df = df.dropna()
    
    pd.concat([df,df.sum(level=0).assign(month='Total').set_index('month', append=True)]).sort_index()
    

    Output:

                     hours
    client month          
    A      January   11.25
           February  16.25
           April     11.25
           August    11.25
           December  16.25
           Total     66.25
    B      January   11.25
           February  16.25
           April      8.25
           August    11.25
           December  16.25
           Total     63.25
    C      January   11.25
           February  16.25
           April     14.25
           August    11.25
           December  16.25
           Total     69.25
    

    Let's use pd.Categorical:

    vp_clients['month'] = pd.Categorical(vp_clients['month'], 
                                         ordered=True, 
                                         categories=['January','February','March',
                                                     'April','May','June','July',
                                                     'August','September','October',
                                                     'November','December'])
    
    df = pd.pivot_table(vp_clients, values='hours', index=['client', 'month'], aggfunc=sum)
    
    df.dropna()
    

    Output:

                     hours
    client month          
    A      January   11.25
           February  16.25
           April     11.25
           August    11.25
           December  16.25
    B      January   11.25
           February  16.25
           April      8.25
           August    11.25
           December  16.25
    C      January   11.25
           February  16.25
           April     14.25
           August    11.25
           December  16.25