Consider a Pandas dataframe which has a column 'id', and the rows of this column consists of list of strings representing categories. What is an efficient way to obtain the dummy variables?
Example:
Input:
df1 = pd.DataFrame({'id': ['0,1', '24,25', '1,24']})
Output:
df2 = pd.DataFrame({'0':[1, 0, 0],
'1': [1, 0, 1],
'24':[0, 1, 1],
'25':[0, 1, 0]})
Use the .str
accessor version of get_dummies
:
df1['id'].str.get_dummies(sep=',')
The resulting output:
0 1 24 25
0 1 1 0 0
1 0 0 1 1
2 0 1 1 0