python IndexError for masked array

I have data formatted as a 3-dimensional array, and I am performing a regression for each element along one of the axes. The following code works as I expect and returns a list of the regression slopes.

import numpy as np
from scipy import stats
import numpy.ma as ma

#make array
np.random.seed(0)
array = np.random.random((4,3,2))

def regress_slope(array):
  N=array.shape[0]
  alpha=0.9
  y = array[:,:,1]
  x = array[:,:,0] 
  result = [stats.mstats.theilslopes(y[i,...],x[i,...],alpha)[0] for i in range(0,N)]
  return result

result = regress_slope(array)
list(result)
print(result)

My "real" data includes invalid values and I have defined a threshold (<0.1) and tried to mask these values from the array. However, when I use the masked array it throws this error:

array2 = ma.masked_less_equal(array, 0.1)
result2 = regress_slope(array2)
list(result2)

IndexError: index 0 is out of bounds for axis 0 with size 0

I am not sure what this error message means, but I think it might be because there are not enough unmasked values to compute the regression? If this is the case, how could I adjust the code to return nan in the result?

Solution

You are correct in that the function stats.mstats.theilslopes fails with that error message if there are not enough unmasked values to compute the regression.

A minimal example:

# this works
a = ma.masked_array([1, 2], mask=[0, 0])
b = ma.masked_array([1, 2], mask=[0, 0])
stats.mstats.theilslopes(a, b, 0.95)

# but this does not
b = ma.masked_array([1, 2], mask=[0, 1])
stats.mstats.theilslopes(a, b, 0.95)

The error message indicates that, somewhere in the process of the computation, it tries to access the 1st element on the 1st axis of a result that has no elements.

I don't know enough about the theory of what you're trying to do to know if the result is useful, but this will fix your immediate problem:

import numpy as np
from scipy import stats
import numpy.ma as ma

np.random.seed(0)
a = np.random.random((4, 3, 2))


def regress_slope(arr):
    def safe_first_theilslopes(arr1, arr2, a):
        try:
            return stats.mstats.theilslopes(arr1, arr2, a)[0]
        except IndexError:
            return np.NaN

    n = arr.shape[0]
    alpha = 0.9
    y = arr[:, :, 1]
    x = arr[:, :, 0]
    return [safe_first_theilslopes(y[i, ...], x[i, ...], alpha) for i in range(0, n)]


result = regress_slope(a)
print(result)

a2 = ma.masked_less_equal(a, 0.1)
result2 = regress_slope(a2)
print(result2)

Note how I have the function return either the first element of the function result (stats.mstats.theilslopes(arr1, arr2, a)[0]) or np.NaN, so that is now baked in to that function.

This code works, but throws a few warnings you could suppress but should probably look into first:

RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)
\lib\site-packages\numpy\core\_methods.py:161: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)