I found that scipy.optimize.minimize works when I use .item() to retrieve a value from an numpy array in the objective function, but it fails when I retrieve by indexing [0,0]:
def sigmoid(Z):
return 1 / (1 + np.exp(-Z))
def hyp_log(X, theta):
return sigmoid(X @ theta)
def cost_log(theta, X, Y, reg_const=0):
hyp = hyp_log(X, theta)
return (Y.T @ -np.log(hyp) + (1-Y).T @ -np.log(1-hyp)).item() / len(X) + reg_const * (theta[1:].T @ theta[1:]).item() / (2 * len(X))
result = minimize(cost_log, theta, args=(X,Y,reg_const), method='TNC')
If I use [0,0]
indexing instead of .item()
in the cost_log
function, the function itself works exactly the same as before, but minimize results in IndexError: too many indices for array
. I want to understand why this happens and what I should be careful of in the objective function when using minimize.
Since you have not provided X
or Y
, I won't look at:
(Y.T @ -np.log(hyp) + (1-Y).T @ -np.log(1-hyp))
but with:
(theta[1:].T @ theta[1:]).item()
if theta
is (n,1):
In [15]: theta = np.arange(5)[:,None]
In [16]: theta.shape
Out[16]: (5, 1)
In [17]: (theta[1:].T @ theta[1:])
Out[17]: array([[30]])
In [18]: (theta[1:].T @ theta[1:])[0,0]
Out[18]: 30
In [19]: (theta[1:].T @ theta[1:]).item()
Out[19]: 30
But if you give that theta
to minimize
, it ravels it to a (n,) shape:
In [20]: theta=theta.ravel()
In [21]: (theta[1:].T @ theta[1:])
Out[21]: 30
In [22]: (theta[1:].T @ theta[1:]).shape
Out[22]: ()
In [23]: (theta[1:].T @ theta[1:]).item()
Out[23]: 30
In [24]: (theta[1:].T @ theta[1:])[0,0]
...
IndexError: invalid index to scalar variable.
I as wrote initially item
can be used with a single item array, regardless of dimensions. [0,0]
only works with a 2d (or higher) array.