python pandas scikit-learn dummy-variable

Convert categorical data into dummy set

I'm having data like this:-

|--------|---------|
| Col1   | Col2    |
|--------|---------|
| X      | a,b,c   |
|--------|---------|
| Y      | a,b     |
|--------|---------|
| X      | b,d     |
|--------|---------|

I want to convert these categorical data to dummy variables. Since my data is large its giving memory error if i'm using get_dummies() from pandas. I want my result like this:-

|------|------|------|------|------|------|
|Col_X |Col_Y |Col2_a|Col2_b|Col2_c|Col2_d|
|------|------|------|------|------|------|
|  1   |  0   |  1   |  1   |  1   |  0   |
|------|------|------|------|------|------|
|  0   | 1    |  1   |  1   |  0   |   0  |
|------|------|------|------|------|------|
|  1   | 0    |  0   |  1   |  0   |   1  |
|------|------|------|------|------|------|

I have tried to convert Col2 using this but getting MemoryError as data is large and there is lot of variability in col2 too.

So,

1) How can I convert multiple categorical columns into dummy variable?

2) pandas get_dummy() is giving memory error, so how could i handle that?

Solution

I would like to give my solution as well. And I would like to thank @James-dellinger for the answer. So here is my approach

df = pd.DataFrame({'Col1': ['X', 'Y', 'X'],
               'Col2': ['a,b,c', 'a,b', 'b,d']})
df

  Col1  Col2
0   X   a,b,c
1   Y   a,b
2   X   b,d

I first split Col2 values and convert it into column values.

df= pd.DataFrame(df['Col2'].str.split(',',3).tolist(),columns = ['Col1','Col2','Col3'])

df

   Col1 Col2 Col3
0   a   b    c
1   a   b    None
2   b   d    None

Then I applied dummy creation on this dataframe without giving any prefix.

df=pd.get_dummies(df, prefix="")

df

    _a  _b  _b  _d  _c
0   1   0   1   0   1
1   1   0   1   0   0
2   0   1   0   1   0

Now to get the desired result we can sum up all the duplicate columns.

df.groupby(level=0, axis=1).sum()

df

    _a  _b  _c  _d
0   1   1   1   0
1   1   1   0   0
2   0   1   0   1

For Col1 we can directly create dummy variables using pd.get_dummies() and store it into different dataframe suppose col1_df. We can concat both columns using pd.concat([df,col1_df], axis=1, sort=False)