I have this 5-5-2 backpropagation neural network I'm training, and after reading this awesome article by LeCun I started to put in practice some of the ideas he suggests.
Currently I'm evaluating it with a 10-fold cross-validation algorithm I made myself, which goes basically like this:
for each epoch
for each possible split (training, validation)
train and validate
end
compute mean MSE between all k splits
end
My inputs and outputs are standardized (0-mean, variance 1) and I'm using a tanh activation function. All network algorithms seem to work properly: I used the same implementation to approximate the sin function and it does it pretty good.
Now, the question is as the title implies: should I standardize each train/validation set separately or do I simply need to standardize the whole dataset once?
Note that if I do the latter, the network doesn't produce meaningful predictions, but I prefer having a more "theoretical" answer than just looking at the outputs.
By the way, I implemented it in C, but I'm also comfortable with C++.
You will most likely be better off standardizing each training set individually. The purpose of cross-validation is to get a sense for how well your algorithm generalizes. When you apply your network to new inputs, the inputs will not be ones that were used to compute your standardization parameters. If you standardize the entire data set at once, you are ignoring the possibility that a new input will fall outside the range of values over which you standardized.
So unless you plan to re-standardize every time you process a new input (which I'm guessing is unlikely), you should only compute the standardization parameters for the training set of the partition being evaluated. Furthermore, you should compute those parameters only on the training set of the partition, not the validation set (i.e., each of the 10-fold partitions will use 90% of the data to calculate standardization parameters).