i have a general question about data pre-processing for machine learning. I know that it is almost a must do to center the data around 0 (mean subtraction), normalize the data (remove the variance). There are other possible techniques. This hast to be used for training-data and validation data sets.
I have encountered a following problem. My neural network, trained to classify specific shapes in images, fails to do so if i do not apply this pre-processing techniques to the images that has to be classified. This 'to classify' images are of course not contained in training set or validation set. By thus my question:
Is it normal to apply normalization to data, which has to be classified, or does the bad performance of my network without this techniques mean, that my model is bad in the sense, that it has failed to generalize and over fitted?
P.S. with normalization used on 'to classify' images, my model performs quite well (about 90% accuracy), without below 30%.
Additional info: model: convolutional neural network with keras and tensorflow.
It goes without saying (although admittedly it is seldom mentioned explicitly in introductory tutorials, hence the frequent frustration of beginners) that new data fed to the model for classification have to undergo the very same pre-processing steps followed for the training (and test) data.
Some common sense is certainly expected here: in all kinds of ML modeling, new input data are expected to have the same "general form" with the original data used for training & testing; the opposite case (i.e. what you have been trying to perform), if you stop for a moment to think about it, you should be able to convince yourself that does not make much sense...
The following answers may help you clarify the idea, illustrating also the case of inverse transforming the predictions whenever necessary: