Search code examples
machine-learningnormalizationscalingdata-science

Data Science Scaling/Normalization real case


When do data pre-processing, it is suggested to do either scaling or normalization. It is easy to do it when you have data on your hand. You have all the data and can do it right away. But after the model built and run, does the first data that comes in need to be scaled or normalized? If it needed, it only one single row how to scale or normalize it? How do we know what is the min/max/mean/stdev from each feature? And how is the incoming data is the min/max/mean each feature?

Please advise


Solution

  • Yes, you need to apply normalization to the input data, else the model will predict nonsense.

    You also have to save the normalization coefficients that were used during training, or from training data. Then you have to apply the same coefficients to incoming data.

    For example if you use min-max normalization:

    f_n = (f - min(f)) / (max(f) - min_(f))

    Then you need to save the min(f) and max(f) in order to perform normalization for new data.