Search code examples
pythonnumpyscipy

Find matching rows in 2 dimensional numpy array


I would like to get the index of a 2 dimensional Numpy array that matches a row. For example, my array is this:

vals = np.array([[0, 0],
                 [1, 0],
                 [2, 0],
                 [0, 1],
                 [1, 1],
                 [2, 1],
                 [0, 2],
                 [1, 2],
                 [2, 2],
                 [0, 3],
                 [1, 3],
                 [2, 3],
                 [0, 0],
                 [1, 0],
                 [2, 0],
                 [0, 1],
                 [1, 1],
                 [2, 1],
                 [0, 2],
                 [1, 2],
                 [2, 2],
                 [0, 3],
                 [1, 3],
                 [2, 3]])

I would like to get the index that matches the row [0, 1] which is index 3 and 15. When I do something like numpy.where(vals == [0 ,1]) I get...

(array([ 0,  3,  3,  4,  5,  6,  9, 12, 15, 15, 16, 17, 18, 21]), array([0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0]))

I want index array([3, 15]).


Solution

  • You need the np.where function to get the indexes:

    >>> np.where((vals == (0, 1)).all(axis=1))
    (array([ 3, 15]),)
    

    Or, as the documentation states:

    If only condition is given, return condition.nonzero()

    You could directly call .nonzero() on the array returned by .all:

    >>> (vals == (0, 1)).all(axis=1).nonzero()
    (array([ 3, 15]),)
    

    To dissassemble that:

    >>> vals == (0, 1)
    array([[ True, False],
           [False, False],
           ...
           [ True, False],
           [False, False],
           [False, False]], dtype=bool)
    

    and calling the .all method on that array (with axis=1) gives you True where both are True:

    >>> (vals == (0, 1)).all(axis=1)
    array([False, False, False,  True, False, False, False, False, False,
           False, False, False, False, False, False,  True, False, False,
           False, False, False, False, False, False], dtype=bool)
    

    and to get which indexes are True:

    >>> np.where((vals == (0, 1)).all(axis=1))
    (array([ 3, 15]),)
    

    or

    >>> (vals == (0, 1)).all(axis=1).nonzero()
    (array([ 3, 15]),)
    

    I find my solution a bit more readable, but as unutbu points out, the following may be faster, and returns the same value as (vals == (0, 1)).all(axis=1):

    >>> (vals[:, 0] == 0) & (vals[:, 1] == 1)