Search code examples
pythonscipysparse-matrixsampling

easy sampling of vectors from a sparse matrix, and creating a new matrix from the sample (python)


This question has two parts (maybe one solution?):

Sample vectors from a sparse matrix: Is there an easy way to sample vectors from a sparse matrix? When I'm trying to sample lines using random.sample I get an TypeError: sparse matrix length is ambiguous.

from random import sample
import numpy as np
from scipy.sparse import lil_matrix
K = 2
m = [[1,2],[0,4],[5,0],[0,8]]
sample(m,K)    #works OK
mm = np.array(m)
sample(m,K)    #works OK
sm = lil_matrix(m)
sample(sm,K)   #throws exception TypeError: sparse matrix length is ambiguous.

My current solution is to sample from the number of rows in the matrix, then use getrow(),, something like:

indxSampls = sample(range(sm.shape[0]), k)
sampledRows = []
for i in indxSampls:
    sampledRows+=[sm.getrow(i)]

Any other efficient/elegant ideas? the dense matrix size is 1000x30000 and could be larger.

Constructing a sparse matrix from a list of sparse vectors: Now imagine I have the list of sampled vectors sampledRows, how can I convert it to a sparse matrix without densify it, convert it to list of lists and then convet it to lil_matrix?


Solution

  • Try

    sm[np.random.sample(sm.shape[0], K, replace=False), :]
    

    This gets you out an LIL-format matrix with just K of the rows (in the order determined by the random.sample). I'm not sure it's super-fast, but it can't really be worse than manually accessing row by row like you're currently doing, and probably preallocates the results.