python optimization numpy nan scientific-computing

fast numpy addnan

I would like to add thousands of 4D arrays element wise and accounting for nans. A simple example using 1D arrays would be:

X = array([4,7,89,nan,89,65, nan])
Y = array([0,5,4, 9,  8, 100,nan])
z = X+Y
print z = array([4,12,93,9,97,165,nan])

I've written a simple for loop around this but it takes forever - not a smart solution. Another solution could be creating a larger array and use bottleneck nansum but this would take too much memory for my laptop. I need a running sum over 11000 cases.

Does anyone have a smart and fast way to do this?

Solution

Here is one possibility:

>>> x = np.array([1, 2, np.nan, 3, np.nan, 4])
... y = np.array([1, np.nan, 2, 5, np.nan, 8])
>>> x = np.ma.masked_array(np.nan_to_num(x), mask=np.isnan(x) & np.isnan(y))
>>> y = np.ma.masked_array(np.nan_to_num(y), mask=x.mask)
>>> (x+y).filled(np.nan)
array([  2.,   2.,   2.,   8.,  nan,  12.])

The real difficulty is that you seem to want nan to be interpreted as zero unless all values at a particular position are nan. This means that you must look at both x and y to determine which nans to replace. If you are okay with having all nan values replaced, then you can simply do np.nan_to_num(x) + np.nan_to_num(y).