python machine-learning logistic-regression theano

Python + Theano: Logistic regression weights do not update

I've compared extensively to existing tutorials but I can't figure out why my weights don't update. Here is the function that return the list of updates:

def get_updates(cost, params, learning_rate):
updates = []
for param in params:
    updates.append((param, param - learning_rate * T.grad(cost, param)))
return updates

It is defined at the top level, outside of any classes. This is standard gradient descent for each param. The 'params' parameter here is fed in as mlp.params, which is simply the concatenated lists of the param lists for each layer. I removed every layer except for a logistic regression one to isolate the reason as to why my cost was not decreasing. The following is the definition of mlp.params in MLP's constructor. It follows the definition of each layer and their respective param lists.

self.params = []
for layer in self.layers:
    self.params += layer.params

The following is the train function, which I call for each minibatch during each epoch:

train = theano.function([minibatch_index], cost,
                    updates=get_updates(cost, mlp.params, learning_rate),
                    givens= {
                        x: train_set_x[minibatch_index * batch_size : (minibatch_index + 1) * batch_size],
                        y: train_set_y[minibatch_index * batch_size : (minibatch_index + 1) * batch_size]
                    })

If you require further details, the entire file is available here: http://pastebin.com/EeNmXfGD

I don't know how many people use Theano (it doesn't seem like plenty); if you've read to this point, thank you.

Fixed: I've determined that I can't use average squared error as the cost function. It works as usual after replacing it with a negative log-likelihood.

Solution

This behavior it caused by a few things but it comes down to the cost not being properly computed. In your implementation , the output of the LogisticRegression layer is the predicted class for every input digit (obtained with the argmax operation) and you take the squared difference between it and the expected prediction.

This will give you gradients of 0s wrt to any parameter in your model because the gradient of the output of the argmax (predicted class) wrt the input of the argmax (class probabilities) will be 0.

Instead, the LogisticRegression should output the probabilities of the classes :

def output(self, input):
    input = input.flatten(2)
    self.p_y_given_x = T.nnet.softmax(T.dot(input, self.W) + self.b)
    return self.p_y_given_x

And then in the MLP class, you compute the cost. You can used mean squared error between the desired probabilities for each class and the probabilities computed by the model but people tend to use the Negative Log Likelihood of the expected classes and you can implement it as such in the MLP class :

def neg_log_likelihood(self, x, y):
    p_y_given_x = self.output(x)
    return -T.mean(T.log(p_y_given_x)[T.arange(y.shape[0]), y])

Then you can use this function to compute your cost and the model trains :

cost = mlp.neg_log_likelihood(x_, y)

A few additional things:

At line 215, when you print your cost, you format it as an integer value but it is a floating point value; this will lose precision in the monitoring.
Initializing all the weights to 0s as you do in your LogisticRegression class is often not recommended. Weights should differ in their original values so as to help break symmetry