Some features are numerical such as "graduation rate from school", while other features are categorical like the name of the school. I used a label encoder on the features that are categorical to transform them into integers.
I now have a dataframe with both floats and integers, representing numerical features and categorical features(transformed with label encoder) respectively.
I am unsure how to proceed with a learner, do I need to use one hot encoding? And if so, how can I do so? I cannot simply pass the dataframe to the sklearn OneHotEncoder since there are floats, according to my current understanding. Do I just apply the label encoder to all features to solve the issue?
Sample data from my dataframe. OPEID and opeid6 were transformed using a label encoder
Just use the OneHotEncoder
categorical_features
argument to select with features are categorical:
categorical_features: “all” or array of indices or mask :
Specify what features are treated as categorical.
- ‘all’ (default): All features are treated as categorical.
- array of indices: Array of categorical feature indices.
mask: Array of length n_features and with dtype=bool.
Non-categorical features are always stacked to the right of the matrix.