Search code examples
arraysnumpyarray-broadcasting

Broadcasted row membership between 2d arrays


Suppose we have the following 2d arrays:

>>> A
array([[1, 1],
       [2, 2],
       [3, 1]])

>>> B
array([[2, 1],
       [1, 2],
       [3, 1],
       [4, 2]])

I want to test the membership of the rows of A in the rows of B. For a single row of A we can test it's membership in B with:

np.any(np.all(A[index] == B, axis=1))

I want to do this for all rows of A at once without looping over the indices. The result should be:

desired_result = array([False, False, True])

How do we retrieve this result in a broadcasted way (without looping over rows of A)?


Solution

  • As you suspected correctly, you can use broadcasting to compare each row of A to every row of B in a vectorized fashion:

    out = (A == B[:, None]).all(axis=-1).any(axis=0)
    
    >>> out
    array([False, False,  True])
    

    Explanation

    To better understand how this works, let's use a modified problem:

    A = np.array([
        [4, 2],
        [1, 1],
        [2, 2],
        [3, 1]])
    
    B = np.array([
        [2, 1],
        [4, 2],
        [1, 2],
        [3, 1],
        [4, 2]])
    

    where we expect to find A[0] ([4, 2]) at rows 1 and 4 in B. Then:

    >>> (A == B[:, None]).all(axis=-1)
    array([[False, False, False, False],
           [ True, False, False, False],
           [False, False, False, False],
           [False, False, False,  True],
           [ True, False, False, False]])
    

    Shows that A[0] == B[1] and also A[0] == B[4] (first column), and that A[3] == B[3] (last column).

    At this point, just .any(axis=0) finishes the job to produce the required result.