print(ice_cream_sales_df.head()); print(' ')
sorted_by_flavor = ice_cream_sales_df.sort_values('flavor')
print(sorted_by_flavor); print(' ')
sum_by_flavor = ice_cream_sales_df.groupby('flavor').agg({"units sold":sum})
print(sum_by_flavor); print(' ')
max_by_week = ice_cream_sales_df.groupby('week').agg({"units sold":max})
print(max_by_week);print(' ')
Output:
week units sold flavor
0 1 6 chocolate
1 1 15 lemon
2 1 12 strawberry
3 1 6 vanilla
4 2 16 chocolate
week units sold flavor
0 1 6 chocolate
124 32 2 chocolate
176 45 10 chocolate
36 10 7 chocolate
76 20 11 chocolate
.. ... ... ...
95 24 2 vanilla
91 23 5 vanilla
87 22 6 vanilla
79 20 8 vanilla
207 52 18 vanilla
[208 rows x 3 columns]
units sold
flavor
chocolate 460
lemon 713
strawberry 399
vanilla 527
units sold
week
1 15
2 18
3 9
4 20
5 20
6 11
7 13
8 18
9 12
10 18
11 18
12 16
13 15
14 14
15 20
16 18
17 16
18 20
19 18
20 13
21 18
22 18
23 15
24 19
25 16
26 11
27 14
28 13
29 18
30 20
31 16
32 17
33 16
34 19
35 18
36 18
37 16
38 11
39 15
40 20
41 14
42 17
43 17
44 20
45 15
46 19
47 18
48 15
49 17
50 9
51 20
52 18
I'm self learning python/pandas. I see that you can group by 2 columns, but whenever I try on kaggle, it gives an error.
max_by_week = ice_cream_sales_df.groupby('week','flavor').agg({"units sold":max})
Error:
ValueError: No axis named flavor for object type DataFrame.
What I'm trying to do is list not only the max for each week, but which flavor HAD the max for that week. How can I do this? If I take out the 2nd column whether its week or flavor the code works. But when I try to group by both, it fails.
Try this,
max_by_week = ice_cream_sales_df.groupby(by=['week','flavor']).agg({"units sold":max})
If you want to group more columns, you should pass the columns to by
argument as a list of strings.
From doc
DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, observed=False, dropna=True)
you can see that axis
is the second argument, which is the reason for your issue