I must create the dummy variables for the column that could have 16 values (0-15), but not necessary has all 16 values when I create dummy variables based on it:
my_column
0 3
1 4
2 7
3 1
4 9
I expect my dummy variables have 16 columns, or more - any another value that fixed by me in advance, and the number in the name of column corresponds to the value of my_column
, but if my_column have only , let's say, 5 values from 16 possible values, the method pd.get_dummies
will create only 5 columns (as expected from this method though) as following :
my_column 1 3 4 7 9
0 3 0 1 0 0 0
1 4 0 0 1 0 0
2 7 0 0 0 1 0
3 1 1 0 0 0 0
4 9 0 0 0 0 1
How can I achieve one of the following results ?
my_column 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 3 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
1 4 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
2 7 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
3 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 9 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
Use get_dummies
+ reindex
on the columns -
v = pd.get_dummies(df.my_column).reindex(columns=range(0, 16), fill_value=0)
According to the docs, reindex
will -
Conform DataFrame to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index.
fill_value=0
will fill all missing columns with zeros.
You can add the original column to the result with insert
or concat
-
v.insert(0, 'my_column', df.my_column)
v = pd.concat([df, v], 1) # alternative to insert
v
my_column 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 3 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
1 4 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
2 7 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
3 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 9 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0