Search code examples
pythonartificial-intelligenceclassificationtheano

Theano: How to implement the distance between desired output (1d) and label as cost function


I would like to train a neural network to represent a function from R^n to R. The neuronal network has only one layer, the input neurons are the function parameters, the output is the function value. For example, the function could be "logical and": Two input values, one output.

In order to train such a network, I need to define a cost function which can then be derived with Theano's support for gradients. The problem is: Usually you would use a neuronal network for classification. A training sample is (input, y) where y is the desired output = index of the output neuron which should have maximum likelihood.

In this case that is impossible, I only have one output neuron and need to compare it with the label => The label is not used for indexing. In pseudocode:

if y==0:
   cost= - output
else:
   cost= - (1-output)

With this approach, the cost would have to be recomputed with every sample. Since the formula for computing the cost depends on the value of y.

I believe it is necessary to implement the choice inside of a theano formula. Something like in this pseudocode:

block1= - output
block2= - (1 - output)
blockMatrix= [block1 : block2]
return blockMatrix[y]

In the theano tutorials indexing is used in combination with differentiation, so this should work.

Actual question: How does blocking work in theano ?

The types of the symbols in my code: output is a matrix, y is a vector. The samples are created like this, a row is a sample.

data_x = numpy.matrix([[0, 0],
                       [1, 0],
                       [0, 1],
                       [1, 1]])

data_y = numpy.array([0,
                      0,
                      0,
                      1])

The complete code on pastebin: http://pastebin.com/PByUyvMQ It is mostly exactly like this tutorial: http://deeplearning.net/tutorial/logreg.html


Solution

  • I'm not quite sure what blocking exactly means, but for simply concatenating two tensor variables, there're theano.tensor.concatenate() and theano.tensor.stack().

    Moreover, there's another formulation of your problem that may save from conditioning and blocking:

    cost = -(1-y)*output -y*(1-output)