Determine sum of numpy array while excluding certain values

I would like to determine the sum of a two dimensional numpy array. However, elements with a certain value I want to exclude from this summation. What is the most efficient way to do this?

For example, here I initialize a two dimensional numpy array of 1s and replace several of them by 2:

import numpy

data_set = numpy.ones((10, 10))

data_set[4][4] = 2
data_set[5][5] = 2
data_set[6][6] = 2

How can I sum over the elements in my two dimensional array while excluding all of the 2s? Note that with the 10 by 10 array the correct answer should be 97 as I replaced three elements with the value 2.

I know I can do this with nested for loops. For example:

elements = []
for idx_x in range(data_set.shape[0]):
  for idx_y in range(data_set.shape[1]):
    if data_set[idx_x][idx_y] != 2:
      elements.append(data_set[idx_x][idx_y])

data_set_sum = numpy.sum(elements)

However on my actual data (which is very large) this is too slow. What is the correct way of doing this?

Solution

Use numpy's capability of indexing with boolean arrays. In the below example data_set!=2 evaluates to a boolean array which is True whenever the element is not 2 (and has the correct shape). So data_set[data_set!=2] is a fast and convenient way to get an array which doesn't contain a certain value. Of course, the boolean expression can be more complex.

In [1]: import numpy as np
In [2]: data_set = np.ones((10, 10))
In [4]: data_set[4,4] = 2
In [5]: data_set[5,5] = 2
In [6]: data_set[6,6] = 2
In [7]: data_set[data_set != 2].sum()
Out[7]: 97.0
In [8]: data_set != 2
Out[8]: 
array([[ True,  True,  True,  True,  True,  True,  True,  True,  True,
         True],
       [ True,  True,  True,  True,  True,  True,  True,  True,  True,
         True],
       ...
       [ True,  True,  True,  True,  True,  True,  True,  True,  True,
         True]], dtype=bool)