I have a 2D numpy array. Some of the values in this array are NaN
. I want to perform certain operations using this array. For example consider the array:
[[ 0. 43. 67. 0. 38.]
[ 100. 86. 96. 100. 94.]
[ 76. 79. 83. 89. 56.]
[ 88. NaN 67. 89. 81.]
[ 94. 79. 67. 89. 69.]
[ 88. 79. 58. 72. 63.]
[ 76. 79. 71. 67. 56.]
[ 71. 71. NaN 56. 100.]]
I am trying to take each row, one at a time, sort it in reversed order to get max 3 values from the row and take their average. The code I tried is:
# nparr is a 2D numpy array
for entry in nparr:
sortedentry = sorted(entry, reverse=True)
highest_3_values = sortedentry[:3]
avg_highest_3 = float(sum(highest_3_values)) / 3
This does not work for rows containing NaN
. My question is, is there a quick way to convert all NaN
values to zero in the 2D numpy array so that I have no problems with sorting and other things I am trying to do.
This should work:
from numpy import *
a = array([[1, 2, 3], [0, 3, NaN]])
where_are_NaNs = isnan(a)
a[where_are_NaNs] = 0
In the above case where_are_NaNs is:
In [12]: where_are_NaNs
Out[12]:
array([[False, False, False],
[False, False, True]], dtype=bool)
A complement about efficiency. The examples below were run with numpy 1.21.2
>>> aa = np.random.random(1_000_000)
>>> a = np.where(aa < 0.15, np.nan, aa)
>>> %timeit a[np.isnan(a)] = 0
536 µs ± 8.11 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
>>> a = np.where(aa < 0.15, np.nan, aa)
>>> %timeit np.where(np.isnan(a), 0, a)
2.38 ms ± 27.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> a = np.where(aa < 0.15, np.nan, aa)
>>> %timeit np.nan_to_num(a, copy=True)
8.11 ms ± 401 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> a = np.where(aa < 0.15, np.nan, aa)
>>> %timeit np.nan_to_num(a, copy=False)
3.8 ms ± 70.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In consequence a[np.isnan(a)] = 0
is faster.