Search code examples
pythonpandasdataframeknime

Pandas Dataframe (Object) - Count occurrence of values and and group in columns


in the Dataframe I have the field Equipment. In this field are comma sepperated values. Now I like to count the occurrence of each value and create a column based on this value. For example:

Col1 ABS
aaa 1
bbb 1
ccc 0
ddd 0
eee 0
import pandas as pd

data = {'Equipment': ['ABS, Android, Zentralverriegelung', 'Xenon, ABS, Apple', 'Android, Hupe, Blinker', 'Radio, CD', '', 'ABS, Android, Zentralverriegelung'],
        'Col1': ['aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa'],
        'Col2': ['bbb', 'bbb', 'bbb', 'bbb', 'bbb', 'bbb'],
        'Col3': ['ccc', 'ccc', 'ccc', 'ccc', 'ccc', 'ccc'],
        'Col4': ['ddd', 'ddd', 'ddd', 'ddd', 'ddd', 'ddd'],
        'Col5': ['eee', 'eee', 'eee', 'eee', 'eee', 'eee'],
        }

df = pd.DataFrame(data)


split_data = df['Equipment'].str.split(',', expand=True)
split_data['index'] = df.index
result_data = pd.get_dummies(split_data.explode('index')).groupby(level=0).agg(sum)

print(result_data)

My Result is: enter image description here

But I was expecting something like this: enter image description here

BR


Solution

  • Maybe you are looking for str.get_dummies:

    >>> df['Equipment'].str.get_dummies(sep=', ')
    
       ABS  Android  Apple  Blinker  CD  Hupe  Radio  Xenon  Zentralverriegelung
    0    1        1      0        0   0     0      0      0                    1
    1    1        0      1        0   0     0      0      1                    0
    2    0        1      0        1   0     1      0      0                    0
    3    0        0      0        0   1     0      1      0                    0
    4    0        0      0        0   0     0      0      0                    0
    5    1        1      0        0   0     0      0      0                    1