Search code examples
pythonnumpymaxsparse-matrixdot-product

SciPy - Generalization of dot product over sparse and dense matrix


Assume normal dot product:

M3[i,k] = sum_j(M1[i,j] * M2[j,k])

Now I would like to replace the sum by sum other operation, say the maximum:

M3[i,k] = max_j(M1[i,j] * M2[j,k])

This question is parallel to Numpy: Dot product with max instead of sum

Only now consider that the solutions

M3 = np.sum(M1[:,:,None]*M2[None,:,:], axis=1)

or

M3 = np.max(M1[:,:,None]*M2[None,:,:], axis=1)

should refer to a dense matrix M1 and a sparse matrix M2. Unfortunately, 3d sparse matrices are not available in SciPy.

Basically, this would mean that in

M3[i,k] = max_j(M1[i,j] * M2[j,k])

we iterate only over j such that M2[j,k]!=0.

What is the most efficient way to solve this problem?


Solution

  • Here's an approach using one loop that iterated through the common axis of reduction -

    from scipy.sparse import csr_matrix
    import scipy as sp
    
    def reduce_after_multiply(M1, M2):
        # M1 : Nump array
        # M2 : Sparse matrix
        # Output : NumPy array
    
        # Get nonzero indices. Get start and stop indices representing 
        # intervaled indices along the axis of reduction containing 
        # the nonzero indices.
        r,c = sp.sparse.find(M2.T)[:2]
        IDs, start = np.unique(r,return_index=1)
        stop = np.append(start[1:], c.size)
    
        # Initialize output array and start loop for assigning values
        m, n = M1.shape[0], M2.shape[1]
        out = np.zeros((m,n))
        for iterID,i in enumerate(IDs):
    
            # Non zero indices for each col from M2. Use these to select 
            # M1's cols and M2's rows. Perform elementwise multiplication.
            idx = c[start[iterID]:stop[iterID]]
            mult = M1[:,idx]*M2.getcol(i).data
    
            # Use the inteneded ufunc along the second axis.
            out[:,i] = np.max(mult, axis=1) # Use any axis supported ufunc here
        return out
    

    Sample run for verification -

    In [248]: # Input data
         ...: M1 = np.random.rand(5,3)
         ...: M2 = csr_matrix(np.random.randint(0,3,(3,1000)))
         ...: 
         ...: # For variety, let's make one column as all zero. 
         ...: # This should result in corresponding col as all zeros as well.
         ...: M2[:,1] = 0
         ...: 
    
    In [249]: # Verify
         ...: out1 = np.max(M1[:,:,None]*M2.toarray()[None,:,:], axis=1)
    
    In [250]: np.allclose(out1, reduce_after_multiply(M1, M2))
    Out[250]: True
    

    Specifically for dot-product, we have a built-in dot method and as such is straight-forward with it. Thus, we can convert the first input which is dense array to a sparse matrix and then use sparse matrix's .dot method, like so -

    csr_matrix(M1).dot(M2)
    

    Let's verify this too -

    In [252]: # Verify
         ...: out1 = np.sum(M1[:,:,None]*M2.toarray()[None,:,:], axis=1)
    
    In [253]: out2 = csr_matrix(M1).dot(M2)
    
    In [254]: np.allclose(out1, out2.toarray())
    Out[254]: True