I have a dataframe with one column looking like:
col
A B C
B C X
U
I would like to generate some dummy variables that tell me if a row contains a specific value. That is, in the example, I would like to generate 5 dummy variables (d_A, d_B, d_C, d_X, d_U) so that the data will look like
col d_A d_B d_C d_X d_U
A B C 1 1 1 0 0
B C X 0 1 1 1 0
...
I have many, many possible values so I cannot do this easily by hand. Any idea how to do that in pandas (in a vectorized mode)?
Thanks!
Use str.get_dummies
and join
or concat
:
print df.col.str.get_dummies(sep=' ')
A B C U X
0 1 1 1 0 0
1 0 1 1 0 1
2 0 0 0 1 0
print df.join(df.col.str.get_dummies(sep=' '))
col A B C U X
0 A B C 1 1 1 0 0
1 B C X 0 1 1 0 1
2 U 0 0 0 1 0
If you need change columns names use list comprehension:
df1 = df.col.str.get_dummies(sep=' ')
df1.columns = ['d_' + x for x in df1.columns]
print df1
d_A d_B d_C d_U d_X
0 1 1 1 0 0
1 0 1 1 0 1
2 0 0 0 1 0
print df.join(df1)
col d_A d_B d_C d_U d_X
0 A B C 1 1 1 0 0
1 B C X 0 1 1 0 1
2 U 0 0 0 1 0
print pd.concat([df, df1], axis=1)
col d_A d_B d_C d_U d_X
0 A B C 1 1 1 0 0
1 B C X 0 1 1 0 1
2 U 0 0 0 1 0