Search code examples
pythonnumpyoptimizationscipylmfit

Scipy NNLS using mask


I am performing non-negative least squares using scipy. A trivial example would be as follows:

import numpy as np
from scipy.optimize import nnls

A = np.array([[60, 70, 120, 60],[60, 90, 120, 70]], dtype='float32')
b = np.array([6, 5])
x, res = nnls(A, b)

Now, I have a situation where some entries in A or b can be missing (np.NaN). Something like,

A_2 = A.copy()
A_2[0,2] = np.NaN

Ofcourse, running NNLS on A_2, b will not work as scipy does not expect an inf or nan.

How can we perform NNLS masking out the missing entry from the computation. Effectively, this should translate to

Minimize |(A_2.x- b)[mask]|

where mask can be defined as:

mask = ~np.isnan(A_2)

In general, entries can be missing from both A and b.

Possibly helpful:

[1] How to include constraint to Scipy NNLS function solution so that it sums to 1


Solution

  • I think you can compute the mask first (determine which points you want included) and then perform NNLS. Given the mask

    In []: mask
    Out[]: 
    array([[ True,  True, False,  True],
           [ True,  True,  True,  True]], dtype=bool)
    

    you can verify whether to include a point by checking if all values in a column are True using np.all along the first axis.

    In []: np.all(mask, axis=0)
    Out[]: array([ True,  True, False,  True], dtype=bool)
    

    This can then be used as a column mask for A.

    In []: nnls(A_2[:,np.all(mask, axis=0)], b)
    Out[]: (array([ 0.09166667,  0.        ,  0.        ]), 0.7071067811865482)
    

    The same idea can be used for b to construct a row mask.