Search code examples
pythonmatrixscipysparse-matrix

How do I get a scipy.csr sparse-matrix as a normal dense matrix without toDense()?


I have a problem with sparse matrixes in scipy. I want to use them as a normal matrix but not with todense() function. I m new in this field, I dont know how I can get the same result when I want to multiply the sparse matrix, but without beeing a sparse matrix... I think sparse matrix only used for faster computation, so it should be possible to do this without a sparse matrix:

sparse_matrix * 5 == sparase_matrix.todense() * 5 == no_sparse_matrix* 5

data = np.ones(5178)
indices   = [34,12,545,23...,25,18,29] Shape:5178L
indptr = np.arange(5178 + 1)

sparse_matrix = sp.csr_matrix((data, indices, indptr), shape = (5178, 3800))

Is this correct? sparse_matrix * 5 == sparase_matrix.todense() * 5 == data * 5 ?

My goal is to get the same result as when the sparse matrix is multiplied without using a sparse matrix? Is this possible? How can I do this?


edit: about my intension: My problem is that I want to transfer a python code into java and my java libary for linear algeba does not provide sparse matrix operstions.

So I have to do the same in java without sparse matrixes. I was not sure, if I can just use the data array instead of a sparse matrix.

In the original code a sparse matrix is multiplied with an other matrix. To transfer that to java I will just multiply the data array of the sparse matrix with the other matrix. Is this correct?


Solution

  • It's not entirely clear what you are asking for, but here's my guess.

    Let's just experiment with a simple array:

    Start with 3 arrays (I took these from another sparse matrix, but that isn't important):

    In [165]: data
    Out[165]: array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11], dtype=int32)
    
    In [166]: indices
    Out[166]: array([1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3], dtype=int32)
    
    In [167]: indptr
    Out[167]: array([ 0,  3,  7, 11], dtype=int32)
    
    In [168]: M=sparse.csr_matrix((data,indices,indptr),shape=(3,4))
    

    These arrays have been assigned to 3 attributes of the new matrix

    In [169]: M.data
    Out[169]: array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11], dtype=int32)
    
    In [170]: M.indices
    Out[170]: array([1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3], dtype=int32)
    
    In [171]: M.indptr
    Out[171]: array([ 0,  3,  7, 11], dtype=int32)
    

    Now try multiplying the .data attribute:

    In [172]: M.data *= 3
    

    Low and behold we have multiplied the 'whole' array

    In [173]: M.A
    Out[173]: 
    array([[ 0,  3,  6,  9],
           [12, 15, 18, 21],
           [24, 27, 30, 33]], dtype=int32)
    

    Of course we can also multiply the matrix directly. That is, multiplication by a constant is defined for csr sparse matrices:

    In [174]: M *= 2
    
    In [175]: M.A
    Out[175]: 
    array([[ 0,  6, 12, 18],
           [24, 30, 36, 42],
           [48, 54, 60, 66]], dtype=int32)
    
    In [176]: M.data
    Out[176]: array([ 6, 12, 18, 24, 30, 36, 42, 48, 54, 60, 66], dtype=int32)
    

    Out of curiousity lets look at the source array. It too has changed. So M.data points to the same array. Change one, change the other.

    In [177]: data
    Out[177]: array([ 6, 12, 18, 24, 30, 36, 42, 48, 54, 60, 66], dtype=int32)
    

    So when the matrix is created this way, it is possible to multiply it by a scalar in several different ways.

    Which is best? Directly multiplying the .data attribute might be faster than multiplying the matrix. But you should be aware of the differences between manipulating .data directly, and using the defined math operations for the whole matrix. For example M*N performs matrix multiplication. You really should understand the matrix data structure before you try changing its internals directly.

    The ability to modify data, the source array, depends on creating the matrix just this way, and maintaining that pointer link. If you defined it via a coo matrix (or coo style inputs), the data link would not be maintained. And M1 = M*2 is not going to pass this link on to M1.

    Get your code working with the normal math operations sparse has defined. Later, if you still to squeeze out more speed, you can dig into the internals, and streamline selected operations.