`scipy.optimize` functions hang even with `maxiter=0`

I am trying to train the MNIST data (which I downloaded from Kaggle) with simple multi-class logistic regression, but the scipy.optimize functions hang.

Here's the code:

import csv
from math import exp
from numpy import *
from scipy.optimize import fmin, fmin_cg, fmin_powell, fmin_bfgs

# Prepare the data

def getIiter(ifname):
    """
    Get the iterator from a csv file with filename ifname
    """
    ifile = open(ifname, 'r')
    iiter = csv.reader(ifile)
    iiter.__next__()
    return iiter

def parseRow(s):
    y = [int(x) for x in s]
    lab = y[0]
    z = y[1:]
    return (lab, z)

def getAllRows(ifname):
    iiter = getIiter(ifname)
    x = []
    l = []
    for row in iiter:
        lab, z = parseRow(row)
        x.append(z)
        l.append(lab)
    return x, l

def cutData(x, y):
    """
    70% training
    30% testing
    """
    m = len(x)
    t = int(m * .7)
    return [(x[:t], y[:t]), (x[t:], y[t:])]

def num2IndMat(l):
    t = array(l)
    tt = [vectorize(int)((t == i)) for i in range(10)]
    return array(tt).T

def readData(ifname):
    x, l = getAllRows(ifname)
    t = [[1] + y for y in x]
    return array(t), num2IndMat(l)

#Calculate the cost function

def sigmoid(x):
    return 1 / (1 + exp(-x))

vSigmoid = vectorize(sigmoid)
vLog = vectorize(log)

def costFunction(theta, x, y):
    sigxt = vSigmoid(dot(x, theta))
    cm = (- y * vLog(sigxt) - (1 - y) * vLog(1 - sigxt)) / m / N
    return sum(cm)

def unflatten(flatTheta):
    return [flatTheta[i * N : (i + 1) * N] for i in range(n + 1)]

def costFunctionFlatTheta(flatTheta):
    return costFunction(unflatten(flatTheta), trainX, trainY)

def costFunctionFlatTheta1(flatTheta):
    return costFunction(flatTheta.reshape(785, 10), trainX, trainY)

x, y = readData('train.csv')
[(trainX, trainY), (testX, testY)] = cutData(x, y)

m = len(trainX)
n = len(trainX[0]) - 1
N = len(trainY[0])

initTheta = zeros(((n + 1), N))
flatInitTheta = ndarray.flatten(initTheta)
flatInitTheta1 = initTheta.reshape(1, -1)

In the last two lines we flatten initTheta because the fmin{,_cg,_bfgs,_powell} functions seem to only take vectors as the initial value argument x0. I also flatten initTheta using reshape in hope this answer can be of help.

There is no problem computing the cost function which takes up less than 2 seconds on my computer:

print(costFunctionFlatTheta(flatInitTheta), costFunctionFlatTheta1(flatInitTheta1))
# 0.69314718056 0.69314718056

But all the fmin functions hang, even if I set maxiter=0. e.g.

newFlatTheta = fmin(costFunctionFlatTheta, flatInitTheta, maxiter=0)

newFlatTheta1 = fmin(costFunctionFlatTheta1, flatInitTheta1, maxiter=0)

When I interrupt the program, it seems to me it all hangs at lines in optimize.py calling the cost functions, lines like this:

return function(*(wrapper_args + args))

For example, if I use fmin_cg, this would be line 292 in optimize.py (Version 0.5).

How do I solve this problem?

Solution

OK I found a way to stop fmin_cg from hanging.

Basically I just need to write a function that computes the gradient of the cost function, and pass it to the fprime parameter of fmin_cg.

def gradient(theta, x, y):
    return dot(x.T, vSigmoid(dot(x, theta)) - y) / m / N

def gradientFlatTheta(flatTheta):
    return ndarray.flatten(gradient(flatTheta.reshape(785, 10), trainX, trainY))

Then

newFlatTheta = fmin_cg(costFunctionFlatTheta, flatInitTheta, fprime=gradientFlatTheta, maxiter=0)

terminates within seconds, and setting maxiter to a higher number (say 100) one can train the model within reasonable amount of time.

The documentation of fmin_cg says the gradient would be numerically computed if no fprime is given, which is what I suspect caused the hanging.

Thanks to this notebook by zgo2016@Kaggle which helped me find the solution.