I would like to train a neural network to represent a function from R^n to R. The neuronal network has only one layer, the input neurons are the function parameters, the output is the function value. For example, the function could be "logical and": Two input values, one output.
In order to train such a network, I need to define a cost function which can then be derived with Theano's support for gradients. The problem is: Usually you would use a neuronal network for classification. A training sample is (input, y) where y is the desired output = index of the output neuron which should have maximum likelihood.
In this case that is impossible, I only have one output neuron and need to compare it with the label => The label is not used for indexing. In pseudocode:
if y==0:
cost= - output
else:
cost= - (1-output)
With this approach, the cost would have to be recomputed with every sample. Since the formula for computing the cost depends on the value of y.
I believe it is necessary to implement the choice inside of a theano formula. Something like in this pseudocode:
block1= - output
block2= - (1 - output)
blockMatrix= [block1 : block2]
return blockMatrix[y]
In the theano tutorials indexing is used in combination with differentiation, so this should work.
Actual question: How does blocking work in theano ?
The types of the symbols in my code: output is a matrix, y is a vector. The samples are created like this, a row is a sample.
data_x = numpy.matrix([[0, 0],
[1, 0],
[0, 1],
[1, 1]])
data_y = numpy.array([0,
0,
0,
1])
The complete code on pastebin: http://pastebin.com/PByUyvMQ It is mostly exactly like this tutorial: http://deeplearning.net/tutorial/logreg.html
I'm not quite sure what blocking exactly means, but for simply concatenating two tensor variables, there're theano.tensor.concatenate()
and theano.tensor.stack()
.
Moreover, there's another formulation of your problem that may save from conditioning and blocking:
cost = -(1-y)*output -y*(1-output)