Search code examples
pythonpandasgroup-by

Trying to aggregate with Pandas


print(ice_cream_sales_df.head()); print(' ')

sorted_by_flavor = ice_cream_sales_df.sort_values('flavor')
print(sorted_by_flavor); print(' ')

sum_by_flavor = ice_cream_sales_df.groupby('flavor').agg({"units sold":sum})
print(sum_by_flavor); print(' ')

max_by_week = ice_cream_sales_df.groupby('week').agg({"units sold":max})
print(max_by_week);print(' ')
Output: 
   week  units sold      flavor
0     1           6   chocolate
1     1          15       lemon
2     1          12  strawberry
3     1           6     vanilla
4     2          16   chocolate
 
     week  units sold     flavor
0       1           6  chocolate
124    32           2  chocolate
176    45          10  chocolate
36     10           7  chocolate
76     20          11  chocolate
..    ...         ...        ...
95     24           2    vanilla
91     23           5    vanilla
87     22           6    vanilla
79     20           8    vanilla
207    52          18    vanilla

[208 rows x 3 columns]
 
            units sold
flavor                
chocolate          460
lemon              713
strawberry         399
vanilla            527
 
      units sold
week            
1             15
2             18
3              9
4             20
5             20
6             11
7             13
8             18
9             12
10            18
11            18
12            16
13            15
14            14
15            20
16            18
17            16
18            20
19            18
20            13
21            18
22            18
23            15
24            19
25            16
26            11
27            14
28            13
29            18
30            20
31            16
32            17
33            16
34            19
35            18
36            18
37            16
38            11
39            15
40            20
41            14
42            17
43            17
44            20
45            15
46            19
47            18
48            15
49            17
50             9
51            20
52            18

I'm self learning python/pandas. I see that you can group by 2 columns, but whenever I try on kaggle, it gives an error.

max_by_week = ice_cream_sales_df.groupby('week','flavor').agg({"units sold":max})

Error:

ValueError: No axis named flavor for object type DataFrame.

What I'm trying to do is list not only the max for each week, but which flavor HAD the max for that week. How can I do this? If I take out the 2nd column whether its week or flavor the code works. But when I try to group by both, it fails.


Solution

  • Try this,

    max_by_week = ice_cream_sales_df.groupby(by=['week','flavor']).agg({"units sold":max})
    

    If you want to group more columns, you should pass the columns to by argument as a list of strings.

    From doc

    DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, observed=False, dropna=True)

    you can see that axis is the second argument, which is the reason for your issue