Efficient computation of the least-squares algorithm in NumPy

I need to solve a large set of linear systems, in the least-squares sense. I am having trouble in understanding the difference in computational efficiency of numpy.linalg.lstsq(a, b), np.dot(np.linalg.pinv(a), b) and the mathematical implementation.

I use the following matrices:

h=np.random.random((50000,100))
a=h[:,:-1].copy()
b=-h[:,-1].copy()

and the results of the algorithms are:

# mathematical implementation
%%timeit
np.dot(np.dot(np.linalg.inv(np.dot(a.T,a)),a.T),b)

10 loops, best of 3: 36.3 ms per loop

# numpy.linalg.lstsq implementation
%%timeit
np.linalg.lstsq(a, b)[0]

10 loops, best of 3: 103 ms per loop

%%timeit
np.dot(np.linalg.pinv(a), b)

1 loop, best of 3: 216 ms per loop

Why is there a difference?

Solution

The routine lstsq handles any systems: over-determined, under-determined, or well-determined. Its output is what you'd get from pinv(a)*b but it is faster than computing the pseudoinverse. Here's why:

General advice: do not compute the inverse matrix unless you actually need it. Solving a system for a particular right hand side is faster than inverting its matrix.

Yet, your approach via solving a^Ta = a^Tb is faster, even though you are inverting a matrix. What gives? The thing is, inverting a^Ta is only valid when a has full column rank. So you've restricted the problem to this particular situation, and gained in speed as a trade off for less generality and, as I show below, for less safety.

But inverting a matrix is still inefficient. If you know that a has full column rank, the following is faster than any of your three attempts:

np.linalg.solve(np.dot(a.T, a), np.dot(a.T, b))

That said, lstsq is still preferable to the above when dealing with poorly conditioned matrices. Forming the product a^Ta basically squares the condition number, so you are more likely to get meaningless results. Here is a cautionary example, using SciPy's linalg module (which is essentially equivalent to NumPy's but has more methods):

import numpy as np
import scipy.linalg as sl
a = sl.hilbert(10)    # a poorly conditioned square matrix of size 10
b = np.arange(10)     # right hand side
sol1 = sl.solve(a, b)
sol2 = sl.lstsq(a, b)[0]
sol3 = sl.solve(np.dot(a.T, a), np.dot(a.T, b))

Here lstsq gives almost the same output as solve (the unique solution of this system). Yet, sol3 is totally wrong because of numeric issues (which you won't even be warned about).

sol1:

  [ -9.89821788e+02,   9.70047434e+04,  -2.30439738e+06,
     2.30601241e+07,  -1.19805858e+08,   3.55637424e+08,
    -6.25523002e+08,   6.44058066e+08,  -3.58346765e+08,
     8.31333426e+07]

sol2:

  [ -9.89864366e+02,   9.70082635e+04,  -2.30446978e+06,
     2.30607638e+07,  -1.19808838e+08,   3.55645452e+08,
    -6.25535946e+08,   6.44070387e+08,  -3.58353147e+08,
     8.31347297e+07]

sol3:

  [  1.06913852e+03,  -4.61691763e+04,   4.83968833e+05,
    -2.08929571e+06,   4.55280530e+06,  -5.88433943e+06,
     5.92025910e+06,  -5.56507455e+06,   3.62262620e+06,
    -9.94523917e+05]