Search code examples
neural-networkartificial-intelligencebackpropagationmnistfeed-forward

MNIST - Training stuck


I'm reading Neural Networks and Deep Learning (first two chapters), and I'm trying to follow along and build my own ANN to classify digits from the MNIST data set.

I've been scratching my head for several days now, since my implementation peaks out at ~57% accuracy at classifying digits from the test set (some 5734/10000) after 10 epochs (accuracy for the training set stagnates after the tenth epoch, and accuracy for the test set deteriorates presumably because of over-fitting).

I'm using nearly the same configuration as in the book: 2-layer feedforward ANN (784-30-10) with all layers fully connected; standard sigmoid activation functions; quadratic cost function; weights are initialized the same way (taken from a Gaussian distribution with mean 0 and standard deviation 1) The only differences being that I'm using online training instead of batch/mini-batch training and a learning rate of 1.0 instead of 3.0 (I have tried mini-batch training + learning rate of 3.0 though)

And yet, my implementation doesn't pass the 60% percentile after a bunch of epochs where as in the book the ANN goes above %90 just after the first epoch with pretty much the exact same configuration. At first I messed up implementing the backpropagation algorithm, but after reimplementing backpropagation differently three times, with the exactly the same results in each reimplementation, I'm stumped...

An example of the results the backpropagation algorithm is producing:

With a simpler feedforward network with the same configuration mentioned above (online training + learning rate of 1.0): 3 input neurons, 2 hidden neurons and 1 output neuron.

The initial weights are initialized as follows:

Layer #0 (3 neurons)

Layer #1 (2 neurons)
  - Neuron #1: weights=[0.1, 0.15, 0.2] bias=0.25
  - Neuron #2: weights=[0.3, 0.35, 0.4] bias=0.45

Layer #2 (1 neuron)
  - Neuron #1: weights=[0.5, 0.55] bias=0.6

Given an input of [0.0, 0.5, 1.0], the output is 0.78900331. Backpropagating for the same input and with the desired output of 1.0 gives the following partial derivatives (dw = derivative wrt weight, db = derivative wrt bias):

Layer #0 (3 neurons)

Layer #1 (2 neurons)
  - Neuron #1: dw=[0, 0.0066968054, 0.013393611] db=0.013393611
  - Neuron #2: dw=[0, 0.0061298212, 0.012259642] db=0.012259642

Layer #2 (1 neuron)
  - Neuron #1: dw=[0.072069918, 0.084415339] db=0.11470326

Updating the network with those partial derivatives yields a corrected output value of 0.74862305.


If anyone would be kind enough to confirm the above results, it would help me tremendously as I've pretty much ruled out backpropagation being faulty as the reason for the problem.

Did anyone tackling the MNIST problem ever come across this problem? Even suggestions for things I should check would help since I'm really lost here.


Solution

  • Doh..

    Turns out nothing was wrong with my backpropagation implementation...

    The problem was that I read the images into a signed char (in C++) array, and the pixel values overflowed, so that when I divided by 255.0 to normalize the input vectors into the range of 0.0-1.0, I actually got negative values... ;-;

    So basically I spent some four days debugging and reimplementing the same thing when the problem was somewhere else entirely.