I tried to apply pandas get_dummies function to my dataset. The problem is category value's number is not matched train set and valid set. For example, train set column has 5 kind of values. ex : [1, 2, 3, 4, 5] However, valid set has just 3 kind of values. ex : [1, 3, 5]
When I made model by using train dataset there were 5 dummies is being created. ex: dum_1, dum_2, dum_3, dum_4, dum_5
So, if i just used same function for valid data set this will be made only 3 dummies will be created. ex: dum_1, dum_2, dum_3
It is not possible to predict valid data set to use my model. How to make same dummies for train and valid set? (It is not possible to concat 2 dataset. Please suggest another method except using pd.concat)
Also, if I add new column for valid set, I expect it will make different result. because dummies sequence is not matching between train and valid set.
thanks.
All you need to do is
missing_cols = [col for col in train.columns if col not in valid.columns]
for col in missing_cols:
valid[col] = 0
valid = valid[[train.columns]]