python pandas categorical-data dummy-variable

Get dummy variables in Pandas where rows contain multiple variables as a list?

Consider a Pandas dataframe which has a column 'id', and the rows of this column consists of list of strings representing categories. What is an efficient way to obtain the dummy variables?

Example:

Input:

df1 = pd.DataFrame({'id': ['0,1', '24,25', '1,24']})

Output:

df2 = pd.DataFrame({'0':[1, 0, 0],
               '1': [1, 0, 1],
               '24':[0, 1, 1],
               '25':[0, 1, 0]})

Solution

Use the .str accessor version of get_dummies:

df1['id'].str.get_dummies(sep=',')

The resulting output:

   0  1  24  25
0  1  1   0   0
1  0  0   1   1
2  0  1   1   0