Search code examples
pythonnumpysparse-matrixmatrix-multiplicationarray-broadcasting

Numpy, inserting one matrix in another matrix efficiently?


I am trying to make an outer product of two vectors more efficient by removing zero elements, doing the outer product and then enlarging the resulting matrix with rows of zeros or inserting into a zero matrix. (Sparsifying the matrix using scipy does not really work since the cost for conversion is high and I am doing it over and over again.)

import numpy
dim = 100
vec = np.random.rand(1, dim)
mask = np.flatnonzero(vec > 0.8)
vec_sp = vec[:, mask]
mat_sp = vec_sp.T * vec_sp # This is faster than dot product
# Enlarge matrix or insert into zero matrix

Since it is the outer product of two vectors I know the zero rows and columns in the original matrix, they are are indices in the mask variable. To see this,

a = np.array(((1,0,2,0))).reshape(1,-1)
a.T * a
>> array([[1, 0, 2, 0],
       [0, 0, 0, 0],
       [2, 0, 4, 0],
       [0, 0, 0, 0]])

I have tried two different solutions: one using numpy's insert method and append method to the mat_spvariable. The whole thing becomes a for-loop and really slow.

for val in mask:
    if val < mat_sp.shape[0]:
        mat_sp = np.insert(mat_sp, val, values=0, axis=1)
        mat_sp = np.insert(mat_sp, val, values=0, axis=0)
    else:
        mat_sp = np.append(mat_sp, values=np.zeros((mat_sp.shape[0], 1)), axis=1)
        mat_sp = np.append(mat_sp, values=np.zeros((1, mat_sp.shape[1])), axis=0)

The other approach is creating a zero matrix of size dim x dim and then creating a giant index vector from the mask, through two for loops. And then using the index vector to insert the matrix multiplication into the zero matrix. However, this is also super slow.

Any ideas or insights that could solve the problem efficiently would be great as the sparse matrix product takes 2/3 of the time of the non-sparse.


Using the example of @hjpaul we get the following comparison code

import numpy as np
dims = 400

def test_non_sparse():
    vec = np.random.rand(1, dims)
    a = vec.T * vec

def test_sparse():  
    vec = np.random.rand(1, dims)
    idx = np.flatnonzero(vec>0.75)
    oprod = vec[:,idx].T * vec[:,idx]
    vec_oprod = np.zeros((dims, dims))
    vec_oprod[idx[:,None], idx] = oprod


if __name__ == '__main__':
    import timeit
    print('Non sparse:',timeit.timeit("test_non_sparse()", setup="from __main__ import test_non_sparse", number=10000))
    print('Sparse:',timeit.timeit("test_sparse()", setup="from __main__ import test_sparse", number=10000))

The code gives an improvement of course depending on the dimensions of the vectors and the number of zeros. Above 300 dimensions and around 70% zeros gives a modest speed improvement that is increasing with the number of zero elements and dimensions. If the matrices and mask are the same over and over again it is surely possible to get a greater speedup.

(My fault in doing logical indexing was doing idx instead of idx[:,None])


Solution

  • The fastest way to insert one matrix into another is with indices.

    To illustrate with your outer product:

    In [94]: a = np.array(((1,0,2,0)))
    In [95]: idx = np.where(a>0)[0]
    In [96]: idx
    Out[96]: array([0, 2])
    In [97]: a1 = a[idx]
    

    The outer product of the condense array:

    In [98]: a2 = a1[:,None]*a1
    In [99]: a2
    Out[99]: 
    array([[1, 2],
           [2, 4]])
    

    Set up the result array, and use block indexing to identify where the a2 values go:

    In [100]: res = np.zeros((4,4),int)
    In [101]: res[idx[:,None], idx] = a2
    In [102]: res
    Out[102]: 
    array([[1, 0, 2, 0],
           [0, 0, 0, 0],
           [2, 0, 4, 0],
           [0, 0, 0, 0]])
    

    Direct outer of the uncondensed array:

    In [103]: a[:,None]*a
    Out[103]: 
    array([[1, 0, 2, 0],
           [0, 0, 0, 0],
           [2, 0, 4, 0],
           [0, 0, 0, 0]])
    In [104]: np.outer(a,a)
    Out[104]: 
    array([[1, 0, 2, 0],
           [0, 0, 0, 0],
           [2, 0, 4, 0],
           [0, 0, 0, 0]])
    

    If a was 2d, (n,1), this outer could be written as np.dot(a.T,a). dot involves a sum, in this case of the size 1 dimension.

    I think a would have to quite sparse to benefit from this extra indexing work. With scipy sparse matrices I find that a sparsity on the order of 1% to have any speed advantage, even when the matrices are premade.


    In [105]: from scipy import sparse
    In [106]: A = sparse.csr_matrix(a)
    In [107]: A
    Out[107]: 
    <1x4 sparse matrix of type '<class 'numpy.int64'>'
        with 2 stored elements in Compressed Sparse Row format>
    In [108]: A.A
    Out[108]: array([[1, 0, 2, 0]], dtype=int64)
    In [109]: A.T*A           # sparse matrix product, dot
    Out[109]: 
    <4x4 sparse matrix of type '<class 'numpy.int64'>'
        with 4 stored elements in Compressed Sparse Column format>
    In [110]: _.A
    Out[110]: 
    array([[1, 0, 2, 0],
           [0, 0, 0, 0],
           [2, 0, 4, 0],
           [0, 0, 0, 0]], dtype=int64)