I am working on the titanic kaggle competition, to deal with categorical data I’ve splited the data into 2 sets: one for numerical variables and the other for categorical variables. After working with sklearn one hot encoding on the set with categorical variables I tried the regroup the two datasets but since the categorical set is an ndarray and the other one is a dataframe I used:
np.hstack((X_train_num, X_train_cat))
which works perfectly but I no longer have the names of my variables.
Is there another way to do this while maintaining the names of the variables without using pd.get_dummies()?
Thanks
Try
X_train = X_train_num.join(
pd.DataFrame(X_train_cat, X_train_num.index).add_prefix('cat_')
)