Hello there! I have a table from the questionnaire. Several questions contain multiple response and the answers are collected as one person-one answer (in one cell).
id | What fruits do you like? |
---|---|
122 | apple, grapes, kiwi |
412 | kiwi, mango, lemon |
... | ... |
So I tried to split all the cells to make a table and got (example):
data['What fruits do you like'].str.split(',',expand=True)
index | 1 | 2 | 3 |
---|---|---|---|
0 | apple | grapes | kiwi |
1 | kiwi | mango | lemon |
... | ... | ... | ... |
And now I'd like to make a table, where indexes are name of fruit and values are quantity of met fruit in splitted table:
fruit | quantity |
---|---|
apple | 1 |
kiwi | 2 |
grapes | 1 |
... | ... |
I tried to split a table and make a pivot_table, but I don't understand how to make a pivot only have values and indexes from 0 to 10 in columns. I tried to write for cycle, but don't know how to collect the data.
split
, explode
, then value_counts
:
out = (df['What fruits do you like?'].str.split(', *')
.rename('fruit')
.explode().value_counts()
.reset_index(name='quantity')
)
Output:
fruit quantity
0 kiwi 2
1 apple 1
2 grapes 1
3 mango 1
4 lemon 1