I have a neural network written in standard C++11 which I believe follows the back-propagation algorithm correctly (based on this). If I output the error in each step of the algorithm, however, it seems to oscillate without dampening over time. I've tried removing momentum entirely and choosing a very small learning rate (0.02), but it still oscillates at roughly the same amplitude per network (with each network having a different amplitude within a certain range).
Further, all inputs result in the same output (a problem I found posted here before, although for a different language. The author also mentions that he never got it working.)
The code can be found here.
To summarize how I have implemented the network:
Neuron
s hold the current weights to the neurons ahead of them, previous changes to those weights, and the sum of all inputs.Neuron
s can have their value (sum of all inputs) accessed, or can output the result of passing said value through a given activation function.NeuronLayer
s act as Neuron
containers and set up the actual connections to the next layer.NeuronLayer
s can send the actual outputs to the next layer (instead of pulling from the previous).FFNeuralNetwork
s act as containers for NeuronLayer
s and manage forward-propagation, error calculation, and back-propagation. They can also simply process inputs.FFNeuralNetwork
sends its weighted values (value * weight) to the next layer. Each neuron in each layer afterwards outputs the weighted result of the activation function unless it is a bias, or the layer is the output layer (biases output the weighted value, the output layer simply passes the sum through the activation function).Have I made a fundamental mistake in the implementation (a misunderstanding of the theory), or is there some simple bug I haven't found yet? If it would be a bug, where might it be?
Why might the error oscillate by the amount it does (around +-(0.2 +- learning rate)) even with a very low learning rate? Why might all the outputs be the same, no matter the input?
I've gone over most of it so much that I might be skipping over something, but I think I may have a plain misunderstanding of the theory.
It turns out I was just staring at the FFNeuralNetwork parts too much and accidentally used the wrong input set to confirm the correctness of the network. It actually does work correctly with the right learning rate, momentum, and number of iterations.
Specifically, in main
, I was using inputs
instead of a smaller array in
to test the outputs of the network.