I have the data as shown in the table. I want to use Python. For all the fruits that exist in the year 2016 and 2017, I want the frequencies of country in 2015 for those fruits.
Country | Fruit | Year |
---|---|---|
Germany | Apple | 2015 |
France | Apple | 2015 |
France | Apple | 2015 |
Spain | Apple | 2015 |
Germany | Banana | 2015 |
France | Banana | 2015 |
France | Apple | 2016 |
Spain | Apple | 2016 |
Germany | Banana | 2016 |
France | Banana | 2016 |
France | Banana | 2017 |
France | Grapes | 2017 |
The final table I want looks like below:
Fruit | Germany | France | Spain |
---|---|---|---|
Apple | 1 | 2 | 1 |
Banana | 1 | 1 | 0 |
Grapes | 0 | 0 | 0 |
Filter by Year and then pivot it with pivot_table
:
(df[df.Year == 2015]
.pivot_table('Year', 'Fruit', 'Country', aggfunc='count')
.reindex(
index=df.Fruit.unique(),
columns=df.Country.unique()
).fillna(0)
.reset_index())
Country Fruit Germany France Spain
0 Apple 1.0 2.0 1.0
1 Banana 1.0 1.0 0.0
2 Grapes 0.0 0.0 0.0
Another option is to use crosstab
and then select 2015 from the result:
(pd.crosstab(df.Fruit, [df.Country, df.Year])
.loc[:, pd.IndexSlice[:, 2015]]
.droplevel(1, 1)
.reset_index())
Country Fruit France Germany Spain
0 Apple 2 1 1
1 Banana 1 1 0
2 Grapes 0 0 0