Search code examples
pythonnumpycountoverlap

Intersect between 1D ary and every row in 2D ary ? Overlap Count?


You can use numpy.intersect1d(a1,a2) and then the docs provide an option to intersect multiple arrays :

reduce(np.intersect1d, ([1, 3, 4, 3], [3, 1, 2, 1], [6, 3, 4, 2]))

What I want to do is to find the intersection between a 1D array and every row in the corresponding 2D array.

Or better yet just the COUNT of the overlapping elements in every row.

I know I can do that with intersect1d() and a loop, but it will be too slow.

How can we count the overlapping elements in every row the numpy-way ?


Ex:

In [59]: a2 = np.random.choice(np.arange(0,100),(10,5), replace=False)

In [60]: a2
Out[60]: 
array([[50,  5, 25, 40, 19],  1
       [43, 37, 21, 55, 11],  0
       [16, 49,  6, 86, 96],  0
       [80, 66, 87, 51, 64],  0
       [42,  7, 20, 24, 74],  1
       [92, 63, 75, 54, 90],  2
       [ 9, 91, 88, 85, 22],  0
       [ 4, 65, 97, 93, 53],  0
       [18,  0, 57, 71, 76],  0
       [94,  1, 77, 89, 45]]) 0

In [61]: a1 = np.random.choice(np.arange(0,100),5, replace=False)


In [63]: a1
Out[63]: array([63, 54, 20, 60, 25])

Solution

  • To simply get the count of common elements per row, we can get a mask of matches with np.isin and then just the count per row -

    np.isin(arr2D,arr1D).sum(axis=1)
    

    If you want to count each unique element only once in case of duplicate occurences per row and if input elements are positive numbers, we need few more steps -

    # https://stackoverflow.com/a/46256361/ @Divakar
    def bincount2D_vectorized(a):    
        N = a.max()+1
        a_offs = a + np.arange(a.shape[0])[:,None]*N
        return np.bincount(a_offs.ravel(), minlength=a.shape[0]*N).reshape(-1,N)
    
    count = (bincount2D_vectorized(np.isin(arr2D,arr1D)*arr2D)[:,1:]!=0).sum(1)