Initial weights causing stagnation in XOR function learning-neural network

I have a neural network with 2 entry variables, 1 hidden layer with 2 neurons and the output layer with one output neuron. When I start with some randomly (from 0 to 1) generated weights, the network learns the XOR function very fast and good, but in other cases, the network NEVER learns the XOR function! Do you know why this happens and how can I overcome this problem? Could some chaotic behaviour be involved? Thanks!

Solution

This is quite normal situation, because error function for multilayer NN is not convex, and optimization converges to local minimum.

You can just keep initial weights that resulted in successful optimization, or run optimizer multiple times starting from different weights, and keep the best solution. Optimization algorithm and learning rate also plays certain role, for example backpropagation with momentum and/or stochastic gradient descent sometimes work better. Also, if you add more neurons, beyond the minimum needed to learn XOR, this also helps.

There exist methodologies designed to find global minimum, such as simulated annealing, but, in practice they are not commonly used for NN optimization, except for some specific cases