I mostly understand how k-fold cross-validation works and have begun implementing it into my MATLAB scripts, however I have two questions.
When using it to select network features (hidden units, weight decay prior and no. iterations in my case). Should I re-intialise the weights after each 'fold', or should I just feed my next training fold into the already trained network (it has weights that have been optimised for the previous fold) ?
It seems that doing the latter should give lower errors as the previous fold of data will be a good approximation of the next, and so the weights will be closer than those initialised randomly from a gaussian distribution.
Additionally, having validated the network using k-fold validation, and chosen network hyper parameters etc., and I want to start using the network, am I right in thinking that I should stop using k-fold validation and just train once, using all of the available data?
Many thanks for any help.
Yes you should reinitialize the weights after each fold, in order to start with a "blank" network. If you don't do this, then each fold will "leak" into each other, and that's not what K-Fold CV is supposed to do.
After finding the best hyperparameters, yes, you can train it with all the available data. Just remember to keep some hold-out testing data for final testing.