Search code examples
pythonnumpybinary-searchmasked-array

What am I doing wrong with numpy searchsorted?


It's kind a funny behavior in numpy.searchsorted. The following test fail:

import numpy as np

a = np.ma.masked_array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
                        17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
                        31, 32, 33, 0],
                       mask=[False, False, False, False, False, False, False,
                             False, False, False, False, False, False, False,
                             False, False, False, False, False, False, False,
                             False, False, False, False, False, False, False,
                             False, False, False, False, False,  True],
                       fill_value=0, dtype='uint8')

b = np.array([1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
              17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 33],
             dtype='uint8')

expected = np.array([0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13,
                 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27,
                 28, 29, 32])

c = a.searchsorted(b)

np.testing.assert_array_equal(c, expected)

The last entry in the c array is 34 and I don't know why. But a similar one, it pass:

aa = np.ma.masked_array([1, 2, 3, 4, 0],
                        mask=[False, False, False, False, True],
                        fill_value=0, dtype='uint8')

bb = np.array([1, 3, 4], dtype='uint8')

expectedd = np.array([0, 2, 3])

cc = aa.searchsorted(bb)

np.testing.assert_array_equal(cc, expectedd)

On numpy.array.searchsorted documentation, its description said:

Find the indices into a sorted array a such that, if the corresponding elements in v were inserted before the indices, the order of a would be preserved.


Solution

  • np.searchsorted doesn't yet support masked arrays (see here for a list of supported methods).

    You can get the expected result by manually indexing a with the inverse of a.mask, then passing the result as the first argument to np.searchsorted:

    c = np.searchsorted(a[~a.mask], b)
    
    # or alternatively, a[~a.mask].searchsorted(b)
    
    print(np.allclose(c, expected))
    # True