Search code examples
pythonmachine-learninglabel-encoding

Does it make sense to user Standard Scaler after applying Label Encoder?


I'm starting a project on a dataset that contains over 5k unique values for a category.

My question is, after using label encoder, to "enumerate" the categories, does it make sense to use Standard Scaler to make the data a little more "manageable" for my Machine Learning model?

Keep in mind I have over 500k entries in total and 5k unique categories for this particular column.

This is more about the intuition behind it rather than how to code it, but I figured this should be the place to ask.


Solution

  • LabelEncoder should be used for the labels, in order to have labels for n categories replaced with integers from 1 to n. You should do this if it is not already done.

    StandardScaler is meant to be used, eventually, for the training and test data but nor for the labels. It outputs positive or negative float.

    You should certainly not apply this to the label column, as the label column must be a positive Integer.