Search code examples
pythonpandasdataframegrouping

Grouping of dataframe to find the count and sum on columns


I have a dataframe like

       customer        fruit    price
0      cust1           mango     30
1      cust2           apple     45
2      cust1           banana    55
3      cust3           mango     22
4      cust4           banana    54
5      cust3           apple     55
6      cust2           apple     90
7      cust1           mango     45
8      cust3           banana    45
9      cust2           mango     23
10     cust4           mango     44

I need to get how much each customer spent on buying mango and on other fruits (i.e. not mango itself as a category) and count purchases by each customer, again, mango as its own category and other fruits in a separate column. Something like:

      customer   price spent_on_mango  spent_on_others
0      cust1          75                    55   
1      cust2          23                    135       
2      cust3          22                    100
3      cust4          44                    54

Solution

  • Why don't you create a column to indicate whether or not the fruit is a mango and then include that in your groupby?

    df['mango'] = df.fruit == 'mango'
    df2 = df.groupby(['customer', 'mango']).sum().unstack()
    df2.columns = ['not mango', 'mango']
    
    >>> df2
              not mango  mango
    customer                  
    cust1            55     75
    cust2           135     23
    cust3           100     22
    cust4            54     44