Search code examples
pythonindexingmask

python IndexError for masked array


I have data formatted as a 3-dimensional array, and I am performing a regression for each element along one of the axes. The following code works as I expect and returns a list of the regression slopes.

import numpy as np
from scipy import stats
import numpy.ma as ma

#make array
np.random.seed(0)
array = np.random.random((4,3,2))

def regress_slope(array):
  N=array.shape[0]
  alpha=0.9
  y = array[:,:,1]
  x = array[:,:,0] 
  result = [stats.mstats.theilslopes(y[i,...],x[i,...],alpha)[0] for i in range(0,N)]
  return result

result = regress_slope(array)
list(result)
print(result)

My "real" data includes invalid values and I have defined a threshold (<0.1) and tried to mask these values from the array. However, when I use the masked array it throws this error:

array2 = ma.masked_less_equal(array, 0.1)
result2 = regress_slope(array2)
list(result2)

IndexError: index 0 is out of bounds for axis 0 with size 0

I am not sure what this error message means, but I think it might be because there are not enough unmasked values to compute the regression? If this is the case, how could I adjust the code to return nan in the result?


Solution

  • You are correct in that the function stats.mstats.theilslopes fails with that error message if there are not enough unmasked values to compute the regression.

    A minimal example:

    # this works
    a = ma.masked_array([1, 2], mask=[0, 0])
    b = ma.masked_array([1, 2], mask=[0, 0])
    stats.mstats.theilslopes(a, b, 0.95)
    
    # but this does not
    b = ma.masked_array([1, 2], mask=[0, 1])
    stats.mstats.theilslopes(a, b, 0.95)
    

    The error message indicates that, somewhere in the process of the computation, it tries to access the 1st element on the 1st axis of a result that has no elements.

    I don't know enough about the theory of what you're trying to do to know if the result is useful, but this will fix your immediate problem:

    import numpy as np
    from scipy import stats
    import numpy.ma as ma
    
    np.random.seed(0)
    a = np.random.random((4, 3, 2))
    
    
    def regress_slope(arr):
        def safe_first_theilslopes(arr1, arr2, a):
            try:
                return stats.mstats.theilslopes(arr1, arr2, a)[0]
            except IndexError:
                return np.NaN
    
        n = arr.shape[0]
        alpha = 0.9
        y = arr[:, :, 1]
        x = arr[:, :, 0]
        return [safe_first_theilslopes(y[i, ...], x[i, ...], alpha) for i in range(0, n)]
    
    
    result = regress_slope(a)
    print(result)
    
    a2 = ma.masked_less_equal(a, 0.1)
    result2 = regress_slope(a2)
    print(result2)
    

    Note how I have the function return either the first element of the function result (stats.mstats.theilslopes(arr1, arr2, a)[0]) or np.NaN, so that is now baked in to that function.

    This code works, but throws a few warnings you could suppress but should probably look into first:

    RuntimeWarning: Mean of empty slice.
      out=out, **kwargs)
    \lib\site-packages\numpy\core\_methods.py:161: RuntimeWarning: invalid value encountered in double_scalars
      ret = ret.dtype.type(ret / rcount)