matlab optimization linear-algebra least-squares cvx

Least-squares minimization within threshold in MATLAB

The cvx suite for MATLAB can solve the (seemingly innocent) optimization problem below, but it is rather slow for the large, full matrices I'm working with. I'm hoping this is because using cvx is overkill, and that the problem actually has an analytic solution, or that a clever use of some built-in MATLAB functions can more quickly do the job.

Background: It is well-known that both x1=A\b and x2=pinv(A)*b solve the least-squares problem:

minimize norm(A*x-b)

with the distinction that norm(x2)<=norm(x1). In fact, x2 is the minimum-norm solution to the problem, so norm(x2)<=norm(x) for all possible solutions x.

Defining D=norm(A*x2-b), (equivalently D=norm(A*x1-b)), then x2 solves the problem

minimize norm(x)
subject to
norm(A*x-b) == D

Problem: I'd like to find the solution to:

minimize norm(x)
subject to
norm(A*x-b) <= D+threshold

In words, I don't need norm(A*x-b) to be as small as possible, just within a certain tolerance. I want the minimum-norm solution x that gets A*x within D+threshold of b.

I haven't been able to find an analytic solution to the problem (like using the pseudoinverse in the classic least-squares problem) on the web or by hand. I've been searching things like "least squares with nonlinear constraint" and "least squares with threshold".

Any insights would be greatly appreciated, but I suppose my real question is: What is the fastest way to solve this "thresholded" least-squares problem in MATLAB?

Solution

Interesting question. I do not know the answer to your exact question, but a working solution is presented below.

Recap

Define res(x) := norm(Ax - b). As you state, x2 minimizes res(x). In the overdetermined case (typically A having more rows than col's), x2 is the unique minimum. In the underdetermined case, it is joined by infinitely many others*. However, among all of these, x2 is the unique one that minimizes norm(x).

To summarize, x2 minimizes (1) res(x) and (2) norm(x), and it does so in that order of priority. In fact, this characterizes (fully determines) x2.

The limit characterization

But, another characterization of x2 is

x2 := limit_{e-->0} x_e

where

x_e := argmin_{x} J(x;e)

where

J(x;e) := res(x) + e * norm(x)

It can be shown that

x_e = (A A' + e I)^{-1} A' b      (eqn a)

It should be appreciated that this characterization of x2 is quite magical. The limits exists even if (A A')^{-1} does not. And the limit somehow preserves priority (2) from above.

Using e>0

Of course, for finite (but small) e, x_e will not minimize res(x)(instead it minimzes J(x;e)). In your terminology, the difference is the threshold. I will rename it to

gap := res(x_e) - min_{x} res(x).

Decreasing the value of e is guaranteed to decrease the value of the gap. Reaching a specific gap value (i.e. the threshold) is therefore easy to achieve by tuning e.**

This type of modification (adding norm(x) to the res(x) minimization problem) is known as regularization in statistics litterature, and is generally considered a good idea for stability (numerically and with respect to parameter values).

*: Note that x1 and x2 only differ in the underdetermined case

**:It does not even require any heavy computations, because the inverse in (eqn a) is easily computed for any (positive) value of e if the SVD of A has already been computed.