I am trying to run a large-scale linear program and am translating some of my previous code from MATLAB into Python. However, the problem is that MATLAB and Python are giving me drastically conflicting answers: the MATLAB code finds the optimal solution, yet the Python code says that the problem is infeasible. This is an LP modelling of ell_infinity regression or minimax regression. I am fairly confident in the setup of both functions.
MATLAB code:
function [ x_opt, f_opt, exitflag, output] = ell_infinity_reg_solver( A, b )
% Solves the ell_infinity regression problem ||Ax - b||_inf. That is finds
% the least t for which Ax - b < t.ones and Ax - b > -t.ones.
[n,d] = size(A) ;
if n == 0
f_opt = 0 ;
x_opt = zeros(d,1) ;
return
end
% infinity norm
f = [ zeros(d,1); 1 ];
A = sparse([ A, -ones(n,1) ; -A, -ones(n,1) ]);
b = [ b; -b ];
[xt, f_opt, exitflag, output] = linprog(f,A,b);
x_opt = xt(1:d,:);
end
Python code
from scipy.optimize import linprog
import numpy as np
def ell_infinity_regression_solver(A, b):
"""Compute the solution to the ell_infinity regression problem.
x_hat = arg min_x ||Ax - b||_infty by a reformulating as LP:
min t :
Ax - b <= t * ones
b - Ax >= - t*ones
i.e. finds minimal radius ball enclosing distance between all data points
and target values b.
Input:
A - array or datafram of dimension n by d
b - target vector of size n by 1.
Output:
x_opt which solves the optimisation.
f_opt the radius of the enclosing ball
"""
n,d = A.shape
# include a check on n and d to ensure dimensionality makes sense
if n > 0:
f = np.vstack((np.zeros((d,1)), [1])).ravel()
print("shape of f {}".format(f.shape)) # should be d+1 length
big_A = np.hstack((np.vstack((A,-A)), np.ones((2*n,1))))
print("shape of big_A {}".format(big_A.shape)) # should be 2n*(d+1)
#big_b = np.vstack((b,-b)) # should be length 2n
big_b = b.append(-b) # only works for data frames -- switch to np array?
print("Type of b {}".format(type(b)))
print("Shape of b {}".format(b.shape))
print("shape of big_b {}".format(big_b.shape))
output = linprog(f, A_ub=big_A, b_ub=big_b)
#print(output)
else:
f_opt = 0
x_opt = np.zeros_like(b)
return output
As the scipy
method didn't work wI also tried CVXOPT
with the added lines
c = cvxopt.matrix(f)
G = cvxopt.matrix(X)
h = cvxopt.matrix(Y)
output = cvxopt.solvers.lp(c, G, h, solver='glpk')
But again this returned a flag of dual-infeasible
which contrasts the MATLAB exitflag=1
(success) and dual-simplex
algorithm used.
The data I tested on is a 500 by 11 data matrix with an associated set of response variables.
I am interested in what has caused such a significant difference in the outputs and which of these is correct. One thing which does add confusion is that reducing the size of the input to one which is less than full-rank doesn't seem to affect the MATLAB output flag but the Python code doesn't find an optimal solution (as expected I think).
The data is at https://www.dropbox.com/s/g4qcj1q0ncljawq/dataset.csv?dl=0 and the matrix A is the first 11 columns and the target vector is the final column.
(1) Default variable bounds are almost always zero and infinity. I.e., most solvers assume non-negative variables unless otherwise stated. Matlab uses a different default. There by default variables are free.
For your model this means you will need to explicitly tell linprog
that the x
variables are free. The t
variable can be either free or non-negative.
Thus
output = linprog(f, A_ub=big_A, b_ub=big_b, bounds=(None,None),method='interior-point' )
The model is a bit large for the simplex method (the Simplex implementation is somewhat of a toy algorithm). The new interior point method has no problem with it.
(2) The line
big_A = np.hstack((np.vstack((A,-A)), np.ones((2*n,1))))
should probably read
big_A = np.hstack((np.vstack((A,-A)), -np.ones((2*n,1))))
(3) Also note that your model
min t :
Ax - b <= t * ones
b - Ax >= - t*ones
is incorrect. It should be:
min t :
Ax - b <= t * ones
Ax - b >= - t*ones
This gives as LP input:
min t :
Ax - t <= b
-Ax - t <= -b
(There are also other formulations for this problem, some do not store A twice. See [link])