Search code examples
machine-learningpredictionnormalize

Normalizing Input for Machine Learning Algorithm


I would like to normalize (z-score, minmax etc.) my predictor variables for a number of Machine Learning algorithms (Neural Network) and a Log Regression and I am wondering:

1) Should I normalize the entire predictor variables, that is training AND Test data?

2) Should normalize my predicted variables, y?


Solution

  • 1) The correct procedure is to normalize your training data and use the transformation parameters to normalize the test data. Here is an example of a minmax normalization with one feature:

    training = [1, 2, 3]
    test = [0, 4]
    

    The normalized data are the following:

    training_normalized = [0.0, 0.5, 1.0]
    test_normalized = [-0.5, 1.5]
    

    2) Generally the answer is no but there are cases where it may help to transform the target variable. In any case you should make sure that the output of your model is able to match the target variable.