python numpy machine-learning scipy scipy-optimize

Scipy minimization says it is successful, then continues with warnings

I'm attempting to minimize a function. I'm displaying the progress attained by scipy as it runs. The first message displayed is . . .

Optimization terminated successfully.
         Current function value: 0.000113
         Iterations: 32
         Function evaluations: 13299
         Gradient evaluations: 33

This looks promising. The problem is that the process does not terminate. In fact, it continues with messages like

Warning: Maximum number of iterations has been exceeded.
         Current function value: 0.023312
         Iterations: 50
         Function evaluations: 20553
         Gradient evaluations: 51
Warning: Maximum number of iterations has been exceeded.
         Current function value: 0.068360
         Iterations: 50
         Function evaluations: 20553
         Gradient evaluations: 51
Warning: Maximum number of iterations has been exceeded.
         Current function value: 0.071812
         Iterations: 50
         Function evaluations: 20553
         Gradient evaluations: 51
Warning: Maximum number of iterations has been exceeded.
         Current function value: 0.050061
         Iterations: 50
         Function evaluations: 20553
         Gradient evaluations: 51

Below is the code with the call to minimize inside:

def one_vs_all(X, y, num_labels, lmbda):
  
  # store dimensions of X that will be reused
  m = X.shape[0]
  n = X.shape[1]

  # append ones vector to X matrix
  X = np.column_stack((np.ones((X.shape[0], 1)),X))

  # create vector in which thetas will be returned
  all_theta = np.zeros((num_labels, n+1))
  
  # choose initial thetas
  #init_theta = np.zeros((n+1, 1))

  for i in np.arange(num_labels):
    # note theta should be first arg in objective func signature followed by X and y
    init_theta = np.zeros((n+1,1))
    theta = minimize(lrCostFunctionReg, x0=init_theta, args=(X, (y == i)*1, lmbda),
                      options={'disp':True, 'maxiter':50})
    all_theta[i] = theta.x
  return all_theta

I've tried changing minimization methods, changing the number of iterations from as low as 30 to as high as 1000. I've also tried supplying my own gradient function. In all cases, the routine does ultimately supply an answer, but it is dead wrong. Anyone know what is happening?

EDIT: The function is differentiable. Here is the cost function, followed by its gradient (unregularized, then regularized).

def lrCostFunctionReg(theta, X, y, lmbda):
  
  m = X.shape[0]

  # unregularized cost
  h = sigmoid(X @ theta)

  # calculate regularization term
  reg_term = ((lmbda / (2*m)) * (theta[1:,].T @ theta[1:,]))
  
  cost_reg = (1/m) * (-(y.T @ np.log(h)) - ((1 - y).T @ np.log(1 - h))) + reg_term

  return cost_reg

def gradFunction(theta, X, y):
  m = X.shape[0]

  theta = np.reshape(theta,(theta.size,1))
  
  # hypothesis as generated in cost function
  h = sigmoid(X@theta)

  # unregularized gradient
  grad = (1/m) * np.dot(X.T, (h-y))

  return grad

def lrGradFunctionReg(theta, X, y, lmbda):
  
  m = X.shape[0]

  # theta reshaped to ensure proper operation
  theta = np.reshape(theta,(theta.size,1))

  # generate unregularized gradient
  grad = gradFunction(theta, X, y)
  
  # calc regularized gradient w/o touching intercept; essential that only 1 index used
  grad[1:,] = ((lmbda / m) * theta[1:,]) + grad[1:,]

  return grad.flatten()

Solution

To answer my own question, the issue turned out to be one of vector shape. I enjoy coding in 2D, but SciPy optimization routines only work with column and row vectors that have been "flattened" to an array. Multi-dimensional matrices are fine, but column and row vectors are a bridge too far.

For example, if y is a vector of labels and y.shape is (400,1), you would need to use y.flatten() on y, which would make y.shape = (400,). Then SciPy would work with your data assuming all other dimensions made sense.

So, if your efforts to translate MATLAB machine learning code to Python have stalled, check to ensure you have flattened your row and column vectors, especially those returned by a gradient function.