Search code examples
pythonneural-networktheano

Dynamically creating symbolic expressions in theano


I am trying to implement a cost function in theano for a feed forward neural network with multiple hidden layers. The cost function is

cost=((W1*W1).sum()+(b1*b1).sum()+(W2*W2).sum()+(b2*b2).sum())*reg_lambda

However I decide the number of hidden at runtime through a constructor to the network class. So the number of Ws and bs is decided at runtime and hence the expression for cost has to be created at runtime. I can compute the sums of Ws and bs outside the theano function and simply pass the scalar values. But I need the symbolic expression for computing gradients later. How do I make symbolic expression at runtime?


Solution

  • You can use regular Python loops to construct the cost for a dynamic number of layers. Note that Theano 'run time' and Python 'run time' are two different things. Theano's 'compile time' happens during Python's 'run time' so you can use Python code to construct dynamic Theano expressions that depend on parameters known only when the Python code is running.

    The cost you give is only L2 regularization of the network parameters. You presumably have additional components for the full cost. Here's a full example.

    import numpy
    import theano
    import theano.tensor as tt
    
    
    def compile(input_size, hidden_sizes, output_size, reg_lambda, learning_rate):
        ws, bs = [], []
        x = tt.matrix('x')
        x.tag.test_value = numpy.random.standard_normal(size=(2, input_size))\
            .astype(theano.config.floatX)
        previous_size = input_size
        h = x
        for hidden_size in hidden_sizes:
            w = theano.shared(
                    numpy.random.standard_normal(size=(previous_size, hidden_size))
                              .astype(theano.config.floatX))
            b = theano.shared(numpy.zeros((hidden_size,), dtype=theano.config.floatX))
            h = tt.tanh(tt.dot(h, w) + b)
            ws.append(w)
            bs.append(b)
            previous_size = hidden_size
        w = theano.shared(numpy.random.standard_normal(size=(previous_size, output_size))
                          .astype(theano.config.floatX))
        b = theano.shared(numpy.zeros((output_size,), dtype=theano.config.floatX))
        y = tt.nnet.softmax(tt.dot(h, w) + b)
        ws.append(w)
        bs.append(b)
        z = tt.ivector('z')
        z.tag.test_value = numpy.random.randint(output_size, size=(2,))
        cost = tt.nnet.categorical_crossentropy(y, z).mean()
        for w, b in zip(ws, bs):
            cost += tt.sum(w ** 2) * reg_lambda
            cost += tt.sum(b ** 2) * reg_lambda
        updates = [(p, p - learning_rate * tt.grad(cost, p)) for p in ws + bs]
        return theano.function([x, z], outputs=[cost], updates=updates)
    
    
    theano.config.compute_test_value = 'raise'
    compile(10, [8, 6, 4, 8, 16], 32, 0.1, 0.01)
    

    Note the second for loop which adds L2 regularization components to the cost for each of the layers. The number of layers is passed as a parameter to the function.