Regression with PyBrain

I'm trying to build a surrogate model for 100 samples. I have two inputs and two responses all of which are normalised with the magnitude of their respective maxima.

Normalisation:

for i in range(0, len(array(self.samples)[0])):
        self.max_samples.append(abs(self.samples[:,i].max()))
        self.samples[:,i] /= self.max_samples[-1]
        self.minmax_samples.append([self.samples[:,i].min(), self.samples[:,i].max()])

for i in range(0, len(array(self.targets)[0])):
        self.max_targets.append(abs(self.targets[:,i].max()))
        self.targets[:,i] /= self.max_targets[-1]

The network is built as follows:

self.ANN = FeedForwardNetwork(bias = True)
inLayer = TanhLayer(len(array(self.samples[0])[-1]))
hiddenLayer = TanhLayer(17)
outLayer = LinearLayer(len(array(self.targets[0])[-1]))

self.ANN.addInputModule(inLayer)
self.ANN.addModule(hiddenLayer)
self.ANN.addOutputModule(outLayer)

in_to_hidden = FullConnection(inLayer, hiddenLayer)
hidden_to_out = FullConnection(hiddenLayer, outLayer)

self.ANN.addConnection(in_to_hidden)
self.ANN.addConnection(hidden_to_out)
self.ANN.sortModules()

self.DataSet = SupervisedDataSet(len(array(self.samples[0])[-1]),len(array(self.targets[0])[-1]))
"Adding training points"
for i, j in zip(self.samples, self.targets):
     self.DataSet.appendLinked(i, j)


trainer = BackpropTrainer( self.ANN, dataset=self.DataSet, momentum=0.99, learningrate = 0.1, verbose=True, weightdecay=0.1)
trainer.trainOnDataset(self.DataSet, 200)

The total error generated by the trainer is of the order 1e-2. I presume it can be better. The responses being generated by the neural net are not at all close to the expected values.

Am I using too few data points? Do Artificial Neural Networks do a good job when we have an input vector of dimensions over 20 and multiple responses (> 5) when the number of sample points which can be generated are under 120?

Solution

You have too few samples for such a complex network.

Your network will have 2*17=34 connections from input to the hidden layer, and 17*2=34 connections from the hidden layer to the output and 17+2=19 connections from the bias units. That means that you have 87 parameters to tune.

If you train your data with 70% of the sample data, and use 30% of the sample data for cross validation and testing, then you get 84 "known" values. When the number of known values is similar to (or even lower than) the number of parameters your neural network can easily overfit so that it makes a perfect match for the training data (very low training error) but it will be useless at other data.

You need a less complex network or more samples.