Search code examples
pythonneural-networkpybrain

Why a Trained Pybrain network yield different results even with an input use for training


I have trained a neural network using pybrain. But when i test my network using the same input as the one used for training, i get complete different result. Here is my code

from pybrain.structure import FeedForwardNetwork
from pybrain.structure import LinearLayer, SigmoidLayer
from pybrain.structure import FullConnection
import numpy as np
from pybrain.datasets import SupervisedDataSet
from pybrain.supervised import BackpropTrainer
from pybrain.tools.xml.networkreader import NetworkReader
from pybrain.tools.xml.networkwriter import NetworkWriter
from pybrain.utilities import percentError

n = FeedForwardNetwork()

inLayer = LinearLayer(2)
hiddenLayer = SigmoidLayer(3)
outLayer = LinearLayer(1)

n.addInputModule(inLayer)
n.addModule(hiddenLayer)
n.addOutputModule(outLayer)

in_to_hidden = FullConnection(inLayer, hiddenLayer)
hidden_to_out = FullConnection(hiddenLayer, outLayer)

n.addConnection(in_to_hidden)
n.addConnection(hidden_to_out)
n.sortModules()

X = np.array(([3,5], [5,1], [10,2]),dtype=float)
Y = np.array(([75], [82], [93]),dtype=float)
X/=np.amax(X, axis=0)
Y/=100

print(n.activate([ 1, 2]))
print(in_to_hidden.params)
ds = SupervisedDataSet(2,1)
for i in range(len(X)):
  ds.addSample(X[i],Y[i])

trainer=BackpropTrainer(n,ds, learningrate=0.5, momentum=0.05,verbose=True)
trainer.trainUntilConvergence(ds)
trainer.testOnData(ds, verbose=True)

Now when i want to test on an input using the code print("Testing",n.activate([3,5])) i get ('Testing', array([ 1.17809308])) . I should have had around0.75 for this input n.activate([3,5]). So i dont understand why this strange result


Solution

  • If I understand you correctly, this is just one aspect of model validation that you will always have to undertake. The network generally seeks to minimise its error against all of the training data, but it will not get each result exactly. You could probably improve prediction accuracy by running more epochs with more hidden neurons. However, doing so would eventually lead to over-fitting through excessive flexibility. It's a bit of a balancing act.

    As an analogy, take regression. In the linear case below, the model does not match any of the training (blue) data, but generally captures the trend for blue and red (external test) data. Using the linear equation would always give me the wrong answer for all data but it's a decent approximator. Then say that I fit a polynomial trendline to the data. Now it has a lot more flexibility, hitting all of the blue points but the error on the testing data has increased.

    regression

    Once you have your network built, you need to rerun all of your data back through it. You can then validate on absolute average deviation, MSE, MASE etc. in addition to things like k-fold cross validation. Your tolerance of error is based on your application: in engineering, I might always need to be within 5% error, and anything that exceeds that threshold (which would occur in the second graph) could have fatal consequences. In language processing, I might be able to tolerate one or two real mess-ups and try catch them another way if the majority of predictions are very close, so I'd possibly take the second graph.

    Playing with your learning rate and momentum might help converge on a better solution.

    EDIT: Based on comments

    The comment "should have been able to recognise it" implies to me something different than the basis of the neural network. There is not even a vague concept of memory in the network, it simply uses the training data to develop a convoluted set of rules to try and minimise its error against all data points. Once the network is trained, it has no recollection of any of the training data, it's just left with a spaghetti of multiplication steps it will perform on input data. So no matter how good your network is, you will never be able to reverse-map your training inputs to exactly the right answer.

    The idea of "convergence" cannot be taken to mean that you have a good network. The network might just have found a local minima in error and given up learning. That is why you must always validate your models. If you are not happy with the result of the validation, you can try improve the model by:
    - Simply re-running it again. The random initialisation of the network might now avoid the local minima
    - Changing the number of neurons. This loosens or tightens the flexibility of the model
    - Change the learning rate and momentum
    - Change the learning rule e.g. swapping from Levenberg-Marquardt to Bayesian Regularisation