Search code examples
numpymeanmissing-datanumba

numpy nanmean in numba


I am trying to write a simpler version of numpy.nanmean for numba. Here's my code:

from numba import jit, prange
import numpy as np

@jit(nopython=True)
def nanmeanMY(a, axis=None):
    if a.ndim>1:
        ncols = a.shape[1]
        nrows = a.shape[0]
        a = a.T.flatten()
        res = np.zeros(ncols)
        for i in prange(ncols):
            col_no_nan = a[i*nrows:(i+1)*nrows]
            res[i] = np.mean(col_no_nan[~np.isnan(col_no_nan)])
        return res
    else:
        return np.mean(a[~np.isnan(a)])

The code is supposed to check if you are dealing with a vector or with a matrix., and give the column-wise means if matrix. Using a test matrix

X = np.array([[1,2], [3,4]])
nanmeanMY(X)

I get the following error:

Traceback (most recent call last):

  Cell In[157], line 1
    nanmeanMY(a)

  File ~\anaconda3\Lib\site-packages\numba\core\dispatcher.py:468 in _compile_for_args
    error_rewrite(e, 'typing')

  File ~\anaconda3\Lib\site-packages\numba\core\dispatcher.py:409 in error_rewrite
    raise e.with_traceback(None)

TypingError: No implementation of function Function(<built-in function getitem>) found for signature:
 
getitem(array(int32, 2d, C), array(bool, 2d, C))
 
There are 22 candidate implementations:
      - Of which 20 did not match due to:
      Overload of function 'getitem': File: <numerous>: Line N/A.
        With argument(s): '(array(int32, 2d, C), array(bool, 2d, C))':
       No match.
      - Of which 2 did not match due to:
      Overload in function 'GetItemBuffer.generic': File: numba\core\typing\arraydecl.py: Line 209.
        With argument(s): '(array(int32, 2d, C), array(bool, 2d, C))':
       Rejected as the implementation raised a specific error:
         NumbaTypeError: Multi-dimensional indices are not supported.
  raised from C:\Users\*****\anaconda3\Lib\site-packages\numba\core\typing\arraydecl.py:89

During: typing of intrinsic-call at C:\Users\*****\AppData\Local\Temp\ipykernel_10432\1652358289.py (22)

What is the problem here?


Solution

  • Apparently, because you are reusing variable a, numba cannot correctly infer the type of variable a.

    Instead of reusing a variable, create a new variable.

    @jit(nopython=True)
    def nanmeanMY(a):
        if a.ndim > 1:
            ncols = a.shape[1]
            nrows = a.shape[0]
            a_flatten = a.T.flatten()  # Renamed a to a_flatten.
            res = np.zeros(ncols)
            for i in prange(ncols):
                col_no_nan = a_flatten[i * nrows : (i + 1) * nrows]  # Use a_flatten.
                res[i] = np.mean(col_no_nan[~np.isnan(col_no_nan)])
            return res
        else:
            return np.mean(a[~np.isnan(a)])