python neural-network backpropagation pybrain

What does the error output from trainer.train() in PyBrain refer to?

What does the error printed from PyBrain Trainer.train() function refer to? More specifically, when I do this:

>>> trainer = BackpropTrainer(fnn, ds_train)
>>> trainer.train()
0.024

What does the number 0.024 mean? I am asking because when I train my neural network I get an Error output of 3000.

>>> trainer.train()
3077.0233

Could anybody explain the significance of this number?

Solution

This number appears to be the mean weighted error over the course of the training procedure.

The code on GitHub is fairly easy to follow. Here's an edited version of it:

def train(self):
    self.module.resetDerivatives()
    errors = 0
    ponderation = 0.
    for seq in self.ds._provideSequences():
        e, p = self._calcDerivs(seq)
        errors += e
        ponderation += p
        self.module.params[:] = self.descent(self.module.derivs, errors)
        self.module.resetDerivatives()
    return errors / ponderation

def _calcDerivs(self, seq):
    self.module.reset()
    for sample in seq:
        self.module.activate(sample[0])
    error = 0
    ponderation = 0.
    for offset, sample in reversed(list(enumerate(seq))):
        target = sample[1]
        outerr = target - self.module.outputbuffer[offset]
        if len(sample) > 2:  # explicitly weighted examples
            importance = sample[2]
            error += 0.5 * dot(importance, outerr ** 2)
            ponderation += sum(importance)
            self.module.backActivate(outerr * importance)
        else:  # examples have unspecified weight (assume 1)
            error += 0.5 * sum(outerr ** 2)
            ponderation += len(target)
            self.module.backActivate(outerr)
    return error, ponderation

It basically iterates through the dataset that you provide, computing the squared error of the network on each training example in the dataset (in the _calcDerivs method). If the training examples include "importance" then those are used to weight the squared error; otherwise the importance of each example is assumed to be 1.

After computing the derivatives and updating parameters for each example, the train() method returns the total error divided by the total importance, or the weighted mean error over all training sequences that were processed. (In the code the total "importance" or "weight" is called ponderation.)