My question is the following :
I have the following categorical variables in a dataset to predict employee's attrition.
I have currently done one hot encoding of : Job level, Job Role, Marital Status, Over18, Overtime, and keeping the same label encoding for the ordinal columns (PerformanceRating,Relationship Satisfaction and JobSatisfaction).
I will then, after splitting into train and test set, use a Random Forrest Classifier to predict the Attrition (Yes/No).
Am I doing encoding the correct way (one hot for categorical and no encoding for the ordinal columns)?
Thank you so much for helping me with this doubt !
First of all, you should doubt everything and take a performance test.
In general, decision tree models either don't need one-hot encoding or even perform worse after it. There's exceptions, for sure, but in modern ML world tree models if often respected for ability to work with high-dimension categorical data without much of pre-processing.