Search code examples
pythonnumpyscipysparse-matrix

Concatenating specific rows from a sparse matrix in scipy


I have a large sparse matrix (using scipy.sparse) with I rows and U columns, U is much greater than I. I have a list of U random numbers in the range of 0:I. I would like to create a new sparse matrix which will be a U * U sparse matrix, the row for user u will hold all the U values in row i of the original sparse matrix. For example, if the original matrix is a 3*5 matrix:

0,0,2,1,0
0,0,3,4,1
1,1,0,2,0

and the list of random numbers is [0,0,2,1,2]

The resulting matrix should be:

0,0,2,1,0
0,0,2,1,0
1,1,0,2,0
0,0,3,4,1
1,1,0,2,0

I am using this code now, which is very very slow:

for u in range(U):
    i= random_indices[u]
    if u == 0:
        output_sparse_matrix = original_sparse_matrix[i, :]
    else:
        output_sparse_matrix = vstack((output_sparse_matrix,
                                       original_sparse_matrix[i, :]))

Any suggestions on how this can be done quicker?

Update I used Jérôme Richard's suggestion, but inside a loop - since I got an out of memory error. This is the solution that worked:

bins = np.array_split(random_indices, 10)
output_sparse_matrix = original_sparse_matrix[bins[0]]

for bin in bins[1:10]:
   output_sparse_matrix = vstack((output_sparse_matrix ,original_sparse_matrix[bin]))

Solution

  • vstack create a new matrix for every iteration. This is the main source of slowdown since the complexity of the algorithm is O(U^3). You can just append the new lines in a Python list and then vstack the list of lines. Alternatively, a better approach is just to use the following Numpy expression :

    original_sparse_matrix[random_indices, :]