multidimensional-array scipy vectorization linear-algebra least-squares

how to solve many overdetermined systems of linear equations using vectorized codes?

I need to solve a system of linear equations Lx=b, where x is always a vector (3x1 array), L is an Nx3 array, and b is an Nx1 vector. N usually ranges from 4 to something like 10. I have no problems solving this using

scipy.linalg.lstsq(L,b)

However, I need to do this many times (something like 200x200=40000 times) as x is actually something associated with each pixel in an image. So x is actually stored in an PxQx3 array where P and Q is something like 200-300, and the last number '3' refers to the vector x. Right now I just loop through each column and row and solve the equation one-by-one .How do I solve those 40000 different overdetermined systems of linear equations efficiently, probably using some vectorization techniques or other special methods?

thanks

Solution

You can gain some speed by making use of the stack of matrices feature of numpy.linalg routines. This doesn't yet work for numpy.linalg.lstsq, but numpy.linalg.svd does, so you can implement lstsq yourself:

import numpy as np


def stacked_lstsq(L, b, rcond=1e-10):
    """
    Solve L x = b, via SVD least squares cutting of small singular values
    L is an array of shape (..., M, N) and b of shape (..., M).
    Returns x of shape (..., N)
    """
    u, s, v = np.linalg.svd(L, full_matrices=False)
    s_max = s.max(axis=-1, keepdims=True)
    s_min = rcond*s_max
    inv_s = np.zeros_like(s)
    inv_s[s >= s_min] = 1/s[s>=s_min]
    x = np.einsum('...ji,...j->...i', v,
                  inv_s * np.einsum('...ji,...j->...i', u, b.conj()))
    return np.conj(x, x)


def slow_lstsq(L, b):
    return np.array([np.linalg.lstsq(L[k], b[k])[0]
                     for k in range(L.shape[0])])    


def test_it():
    b = np.random.rand(1234, 3)
    L = np.random.rand(1234, 3, 6)

    x = stacked_lstsq(L, b)
    x2 = slow_lstsq(L, b)

    # Check
    print(x.shape, x2.shape)
    diff = abs(x - x2).max()
    print("difference: ", diff)
    assert diff < 1e-13


test_it()

Some timing suggests the stacked version is around 6x faster here, for that problem size. Whether it's worth the trouble depends on the problem.