Search code examples
pythonperformanceoptimizationnumpymask

Python - create mask of unique values in array


I have two numpy arrays, x and y (the length are around 2M). The x are ordered, but some of the values are identical.

The task is to remove values for both x and y when the values in x are identical. My idea is to create a mask. Here is what I have done so far:

def createMask(x):
  idx = np.empty(x.shape, dtype=bool)
  for i in xrange(len(x)-1):
    if x[i+1] == x[i]:
      idx[i] = False

  return idx

idx = createMask(x)
x   = x[idx]
y   = y[idx]

This method works fine, but it is slow (705ms with %timeit). Also I think this look really clumpsy. Is there are more elegant and efficient way (I'm sure there is).

Updated with best answer

The second method is

idx = [x[i+1] == x[i] for i in xrange(len(x)-1)]

And the third (and fastest) method is

idx = x[:-1] == x[1:]

The results are (using ipython's %timeit):

First method: 751ms

Second method: 618ms

Third method: 3.63ms

Credit to mtitan8 for both methods.


Solution

  • I believe the fastest method is to compare x using numpy's == array operator:

    idx = x[:-1] == x[1:]
    

    On my machine, using x with a million random integers in [0, 100],

    In[15]: timeit idx = x[:-1] == x[1:]
    1000 loops, best of 3: 1 ms per loop