Search code examples
pythonsparse-matrixtext-miningword-countsvd

python kernel dead when performing SVD on a sparse symmetrical matrix


I would like to reproduce the SVD method mentioned in a standford lecture on my own dataset. The slide of the lecture is as following

stanford lecture

My dataset is of the same type, which is a word co-occurrence matrix M with a size of

<13840x13840 sparse matrix of type '<type 'numpy.int64'>' 
with 597828 stored elements in Compressed Sparse Column format>

generated and processed from CountVectorizer(), note that this is a symmetric matrix.

However, when I tried to extract features from SVD, however, none of the following code works,

1st try:

scipy.linalg.svd(M)

I have tried the matrix from sparse csr todense() and toarray(), my computer taken quite a few minutes, and it displays kernel stops. I also played around with different parameter settings

2nd try:

scipy.sparse.linalg.svds(M)

I have also tried to change the matrix type from int64 to float64, however, the kernel dead after 30 seconds or so.

Anyone could suggest me a way to conduct SVD on this matrix in any way?

Thank you so much


Solution

  • Seems that the matrix is to stressful for the memory. You have several options:

    1. Perform an adaptive SVD,
    2. Use modred,
    3. Use the SVD from dask.

    The latter two should work out of the box. All these options will load only what the memory can.