Search code examples
machine-learningencodingneural-networkone-hot-encoding

When labeling dimension is too big and want to find another way rather than one-hot encoding


I am a beginner who learns machine learning.

I try to make some model(FNN) and this model has too many output labels to use a one-hot encoding.

Could you help me?

I want to solve this problem : labeling data is for fruits:

Type (Apple, Grapes, Peach), Quality(Good, Normal, Bad), Price(Expensive, Normal, Cheap), Size(Big, Normal, Small)

So, If I make one-hot encoding, the data size up to 3*3*3*3, 81

I think that the labeling data looks like 4 one-hot-encoding sequence data.

Is there any way to make labeling data in small-dimension, not 81 dimension one hot encoding?

I think binary encoding also can be used, but recognized some shortcoming to use binary encoding in NN.

Thanks :D


Solution

  • If you one hot encode your 4 variables you will have 3+3+3+3=12 variables, not 81.

    The concept is that you need to create a binary variable for every category in a categorical feature, not one for every possible combination of categories in the four features.

    Nevertheless, other possible approaches are Numerical Encoding, Binary Encoding (as you mentioned), or Frequency Encoding (change every category with its frequency in the dataset). The results often depend on the problem, so try different approaches and see what best fits yours!