Search code examples
pythonpandaspivot-table

How to split a cell and count all the unique values?


How to split a cell and count all the unique values?

Hello there! I have a table from the questionnaire. Several questions contain multiple response and the answers are collected as one person-one answer (in one cell).

id What fruits do you like?
122 apple, grapes, kiwi
412 kiwi, mango, lemon
... ...

So I tried to split all the cells to make a table and got (example):

data['What fruits do you like'].str.split(',',expand=True)

index 1 2 3
0 apple grapes kiwi
1 kiwi mango lemon
... ... ... ...

And now I'd like to make a table, where indexes are name of fruit and values are quantity of met fruit in splitted table:

fruit quantity
apple 1
kiwi 2
grapes 1
... ...

I tried to split a table and make a pivot_table, but I don't understand how to make a pivot only have values and indexes from 0 to 10 in columns. I tried to write for cycle, but don't know how to collect the data.


Solution

  • split, explode, then value_counts:

    out = (df['What fruits do you like?'].str.split(', *')
           .rename('fruit')
           .explode().value_counts()
           .reset_index(name='quantity')
          )
    

    Output:

        fruit  quantity
    0    kiwi         2
    1   apple         1
    2  grapes         1
    3   mango         1
    4   lemon         1