The train_set is:
type
0 a
1 b
2 c
3 d
4 e
If I use pd.get_dummies, I will get 5 columns:
type_a type_b type_c type_d type_e
0 1 0 0 0 0
1 0 1 0 0 0
2 0 0 1 0 0
3 0 0 0 1 0
4 0 0 0 0 1
The test_set is:
type
0 a
1 b
2 c
3 d
If I use pd.get_dummies, I will get only 4 columns:
type_a type_b type_c type_d
0 1 0 0 0
1 0 1 0 0
2 0 0 1 0
3 0 0 0 1
I want it to be:
type_a type_b type_c type_d type_e
0 1 0 0 0 0
1 0 1 0 0 0
2 0 0 1 0 0
3 0 0 0 1 0
You can try reindex
with all the desired columns
and fill_value=0
:
pd.get_dummies(test_set).reindex(
["type_a", "type_b", "type_c", "type_d", "type_e"], axis=1, fill_value=0)
output
# type_a type_b type_c type_d type_e
# 0 1 0 0 0 0
# 1 0 1 0 0 0
# 2 0 0 1 0 0
# 3 0 0 0 1 0