Search code examples
pythonarraysnumpynannumba

Numba and NumPy nanmean on a 3D array - easy alternative?


I've spent hours converting a huge codebase over to be Numba compatible, and have 1 problem apparently left to fix (well, before other Numba errors pop-up, although this is right near the end of the function). Basically I'm left with a 3D NumPy array at the end, where NaNs fill any spaces that aren't actually there (these are dates and associated prices, different workdays per month is an example where parts of the array will have NaNs for months with less days, e.g. February vs. January has less workdays).

Now the simple solution on this shape:

3darray.shape = (6, 8192, 22)

For example since it contains NaNs is to use the:

averages = numpy.nanmean(3darray, axis=2)

And I get back what I want. Every 22 days, less the NaNs, are averaged across the whole array, so I'm left with a (6, 8192) shape 2D matrix.

But Numba doesn't like this, as it appears they only implement this function on a single value (? or a 1D or 2D array, no idea: https://github.com/numba/numba/blob/master/numba/np/arraymath.py#L1092-L1109 ) So like the rest of my code, must I write some crazy indexing to make Numba happy? Although I'm having some difficulty doing the math here with "nanmasks" even though it's just averaging where no NaNs exist...

nanmask = ~np.isnan(3darray.reshape(-1)) #flatten the array to 1D, get all indexes that aren't NaN

Great, now I know which values aren't NaNs (in a flat 1D matrix)... But what next? There has to be some way in Numba to make this simple math easier that I've overlooked. Any help is appreciated.


Solution

  • I wrote this "interesting" Numba compatible function for a 3D matrix (nanmean)... after myrtlecat pointed out Numba hasn't implemented nanmean for anything other than an entire matrix. Runs in 0.5ms or so on my PC on the (6,8192,22) shape 3D matrix, and matches the np.nanmean function results for np.nanmean(3Darray, axis=2):

    import numpy as np
    import numba as nb
    
    @nb.jit(cache=True, parallel=True, nogil=True)
    def nanmean3D(array):
        output = np.empty((array.shape[0],array.shape[1]))
        for i in nb.prange(array.shape[0]):
            for j in range(array.shape[1]):
                output[i,j]=np.nanmean(array[i,:,:][j,:])
        return output
    

    Well hopefully someone else having an issue with nanmean and Numba can use the above function on a 3D matrix or modify it for their use.