Search code examples
pythonnumpynan

convert nan value to zero


I have a 2D numpy array. Some of the values in this array are NaN. I want to perform certain operations using this array. For example consider the array:

[[   0.   43.   67.    0.   38.]
 [ 100.   86.   96.  100.   94.]
 [  76.   79.   83.   89.   56.]
 [  88.   NaN   67.   89.   81.]
 [  94.   79.   67.   89.   69.]
 [  88.   79.   58.   72.   63.]
 [  76.   79.   71.   67.   56.]
 [  71.   71.   NaN   56.  100.]]

I am trying to take each row, one at a time, sort it in reversed order to get max 3 values from the row and take their average. The code I tried is:

# nparr is a 2D numpy array
for entry in nparr:
    sortedentry = sorted(entry, reverse=True)
    highest_3_values = sortedentry[:3]
    avg_highest_3 = float(sum(highest_3_values)) / 3

This does not work for rows containing NaN. My question is, is there a quick way to convert all NaN values to zero in the 2D numpy array so that I have no problems with sorting and other things I am trying to do.


Solution

  • This should work:

    from numpy import *
    
    a = array([[1, 2, 3], [0, 3, NaN]])
    where_are_NaNs = isnan(a)
    a[where_are_NaNs] = 0
    

    In the above case where_are_NaNs is:

    In [12]: where_are_NaNs
    Out[12]: 
    array([[False, False, False],
           [False, False,  True]], dtype=bool)
    

    A complement about efficiency. The examples below were run with numpy 1.21.2

    >>> aa = np.random.random(1_000_000)
    >>> a = np.where(aa < 0.15, np.nan, aa)
    >>> %timeit a[np.isnan(a)] = 0
    536 µs ± 8.11 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
    >>> a = np.where(aa < 0.15, np.nan, aa)
    >>> %timeit np.where(np.isnan(a), 0, a)
    2.38 ms ± 27.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    >>> a = np.where(aa < 0.15, np.nan, aa)
    >>> %timeit np.nan_to_num(a, copy=True)
    8.11 ms ± 401 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    >>> a = np.where(aa < 0.15, np.nan, aa)
    >>> %timeit np.nan_to_num(a, copy=False)
    3.8 ms ± 70.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    

    In consequence a[np.isnan(a)] = 0 is faster.