Search code examples
pythonneural-networkartificial-intelligencepybrain

Why can't PyBrain Learn Binary


I am attempting to get a network (PyBrain) to learn binary. This my code and it keeps return values around 8, but it should be return 9 when I activate with this target.

from pybrain.tools.shortcuts import buildNetwork
from pybrain.structure import *
from pybrain.datasets import *
from pybrain.supervised.trainers import BackpropTrainer
from matplotlib.pyplot import *


trains = 3000
hiddenLayers = 4
dim = 4
target = (1, 0, 0, 1)

ds = SupervisedDataSet(dim, 1)

ds.addSample((0, 0, 0, 0), (0,))
ds.addSample((0, 0, 0, 1), (1,))
ds.addSample((0, 0, 1, 0), (2,))
ds.addSample((0, 0, 1, 1), (3,))
ds.addSample((0, 1, 0, 0), (4,))
ds.addSample((0, 1, 0, 1), (5,))
ds.addSample((0, 1, 1, 0), (6,))
ds.addSample((0, 1, 1, 1), (7,))
ds.addSample((1, 0, 0, 0), (8,))


net = buildNetwork(dim, hiddenLayers, 1, bias=True, hiddenclass=SigmoidLayer)
trainer = BackpropTrainer(net, ds)

tests = []

for i in range(trains):
    trainer.train()
    tests.append(net.activate(target))


plot(range(len(tests)), tests)


print net.activate(target)
show()

I have tried adjusting the number hidden Layers, the hiddenclass from TanhLayer to SigmoidLayer and varied the number of trains, but it always converges around 500 times (training the network to the dataset). Should I be using a different trainer than back propagation and if so why?


Solution

  • You've built a network with 4 input nodes, 4 hidden nodes, and 1 output node, and 2 biases.

    enter image description here

    Considering each letter as the activation for that node, we can say each hidden node computes its activation as sigmoid(w0*1 + w1*A + w2*B + w3*C + w4*D), and the output node computes its activation as (w0*1 + w1*E + w2*F + w3*G + w4*H) (with no sigmoid). The number of lines in the diagram is the number of the weight parameters in the model that are tweaked during learning.

    With so many parameters but only 9 samples to train on, there are many locally optimal, not-quite-right solutions that the network can converge to.

    One way to fix this is to increase your number of training samples. You could generalize past 1s and 0s and offer samples such as ((0, 0, 1.0, 0.5), (2.5,)) and ((0, 1.2, 0.0, 1.0), (5.8,)).

    Another option is to simplify your model. All you need for a perfect solution is 4 inputs hooked directly to the output with no biases or sigmoids. That model would only have 4 weights which training would set to 1, 2, 4, and 8. The final computation would be 1*A + 2*B + 4*C + 8*D.