Search code examples
pythonpandasindexingsummary

Referencing aggregate column of a groupby result


I have a dataframe like so:

testdf

  category  item  inventory  sold
0        A   Low        100    50
1        A  High        200    75
2        A   Med        130    20
3        A  Misc        435   150
4        A  High        130    65
5        A  Misc        120    88    
6        B  Misc        321   230    
7        B  High        453   450
8        B   Low        321   301
9        B   Low        122    80

I am grouping it based on category and item

dfindx = testdf.groupby(['category','item']).agg({'inventory':['count','sum']})

dfindx

              inventory     
                  count  sum
category item               
A        High         2  330
         Low          1  100
         Med          1  130
         Misc         2  555
B        High         1  453
         Low          2  443
         Misc         1  321

Now on aggregating the indexed table I am facing this error:

dfindx.groupby(['category'])[['count']].sum()

KeyError: "Columns not found: 'count'"

dfindx.columns 

MultiIndex(levels=[['inventory'], ['count', 'sum']],
           labels=[[0, 0], [0, 1]])

I an unable to make the reference to counts and sums correctly to summarize a grouped table.

dfindx.groupby(['category'])[inventory['count']].sum()

NameError: name 'inventory' is not defined

dfindx.groupby(['category'])[['inventory']['count']].sum()

TypeError: list indices must be integers or slices, not str


Solution

  • I think you can simplify first aggregation by aggregate by list with column inventory:

    dfindx = testdf.groupby(['category','item'])['inventory'].agg(['count','sum'])
    print (dfindx)
                   count  sum
    category item            
    A        High      2  330
             Low       1  100
             Med       1  130
             Misc      2  555
    B        High      1  453
             Low       2  443
             Misc      1  321
    

    and then next aggregation working nice:

    print(dfindx.groupby(['category'])[['count']].sum())
              count
    category       
    A             6
    B             4
    

    Buth if want use your code with MultiIndex output use tuple for select in next aggregation:

    dfindx = testdf.groupby(['category','item']).agg({'inventory':['count','sum']})
    print (dfindx)
                  inventory     
                      count  sum
    category item               
    A        High         2  330
             Low          1  100
             Med          1  130
             Misc         2  555
    B        High         1  453
             Low          2  443
             Misc         1  321
    
    print(dfindx.groupby(['category'])[[('inventory','count')]].sum())
             inventory
                 count
    category          
    A                6
    B                4