Search code examples
pythonpandascategorical-datadummy-variable

Get dummy variables in Pandas where rows contain multiple variables as a list?


Consider a Pandas dataframe which has a column 'id', and the rows of this column consists of list of strings representing categories. What is an efficient way to obtain the dummy variables?

Example:

Input:

df1 = pd.DataFrame({'id': ['0,1', '24,25', '1,24']})

Output:

df2 = pd.DataFrame({'0':[1, 0, 0],
               '1': [1, 0, 1],
               '24':[0, 1, 1],
               '25':[0, 1, 0]})

Solution

  • Use the .str accessor version of get_dummies:

    df1['id'].str.get_dummies(sep=',')
    

    The resulting output:

       0  1  24  25
    0  1  1   0   0
    1  0  0   1   1
    2  0  1   1   0