machine-learning scipy regression scipy-optimize scipy-optimize-minimize

scipy.optimize.minimize can't find least square solution

My intention was to solve the l1 error fitting problem using scipy.optimize as suggested in L1 norm instead of L2 norm for cost function in regression model, but I keep getting the wrong solution, so I debugged using least square which we know how to get a closed-form expression:

n=10
d=2
A = np.random.rand(n,d)
x = np.random.rand(d,1)
b = np.dot(A, x)
x_hat = np.linalg.lstsq(A, b)[0]
print(np.linalg.norm(np.dot(A, x_hat)-b))


def fit(X, params):
    return X.dot(params)

def cost_function(params, X, y):
    return np.linalg.norm(y - fit(X, params))

x0 = np.zeros((d,1))

output = minimize(cost_function, x0, args=(A, b))

y_hat = fit(A, output.x)
print(np.linalg.norm(y_hat-b))
print(output)

The output is:

4.726604209672303e-16
2.2714597315189407
      fun: 2.2714597315189407
 hess_inv: array([[ 0.19434496, -0.1424377 ],
       [-0.1424377 ,  0.16718703]])
      jac: array([ 3.57627869e-07, -2.98023224e-08])
  message: 'Optimization terminated successfully.'
     nfev: 32
      nit: 4
     njev: 8
   status: 0
  success: True
        x: array([0.372247  , 0.32633966])

which looks super weird because it can't even solve the l2 regression? Am I making stupid mistakes here?

Solution

The error is due to mismatching shapes. Minimize's x0 and the iterates have shape (n,). So when params has shape (n,), fit(X,params) will be shape (n,) while your y has shape (n,1). Thus, the expression y-fit(X,params) has shape (n,n) due to numpy's implicit broadcasting. And in consequence np.linalg.norm returns the frobenius matrix norm instead of the euclidean vector norm.

Solution: change your cost function to:

def cost_function(params, X, y):
    return np.linalg.norm(y - fit(X, params.reshape(d,1)))