Search code examples
pythonnumpysumnan

Python np.nansum() arrays but not return 0 when summing two nans


Context: I have a 3D numpy array (3,10,10): arr. I have a list gathering indices of the 0th dimension of arr: grouped_indices. I want to calculate the sum of these grouped indices of arr, and store them in a host array: host_arr.

Problem: I am using np.nansum(), however a sum of two NaNs gives me a 0 and I would like it to return a NaN. I don't want to set all the zeros to NaNs once I have calculated the sum.

Question: How can I calculate the nansum of n 2D arrays (same shape), but set as NaN any cell for which all the arrays have a NaN ?

Example:

import numpy as np
import matplotlib.pyplot as plt

# Generate example data
np.random.seed(0)
arr_shape = (10, 10)
num_arrays = 3

# Create a 3D numpy array with random values
arr = np.random.rand(num_arrays, *arr_shape)

# Introduce NaNs
arr[0, :5, :5] = np.nan
arr[1, 2:7, 2:7] = np.nan
arr[2] = np.nan
arr[2, :2, :2] = 10

# Generate a list of arrays containing indices of the 0th dimension of arr
grouped_indices = [np.array([0,1]), np.array([0,1,2])]

# Create a host array that is the sum of grouped_indices slices
host_arr = np.array([np.nansum(arr[indices], axis=0) for indices in grouped_indices])

# Plot the nansums
plt.figure()
plt.imshow(host_arr[0]) # indices [2:5, 2:5] should be NaNs
plt.colorbar()
plt.figure()
plt.imshow(host_arr[1]) # indices [2:5, 2:5] should be NaNs too
plt.colorbar()

Solution

  • IIUC, you can use np.all + np.isnan to detect where you have all NaNs and set these values to NaN after np.nansum explicitly

    host_arr = np.array([np.nansum(arr[indices], axis=0) for indices in grouped_indices])
    
    host_arr[0][np.all(np.isnan(arr[grouped_indices[0]]), axis=0)] = np.nan
    host_arr[1][np.all(np.isnan(arr[grouped_indices[1]]), axis=0)] = np.nan
    

    Then the result is:

    enter image description here