Search code examples
pythonnumpyscipysparse-matrix

How to diagonalise sparse csr 1D-matrix (vector) in Python?


[Short version]

Is there an equivalent to numpy.diagflat() in scipy.sparse? Or any way to 'flatten' a sparse matrix made dense?

[Long version]

I have a sparse matrix (mathematically a vector), x_f, that I need to diagonalise (i.e. create a square matrix with the values of the x_f vector on the diagonal).

x_f
Out[59]: 
<35021x1 sparse matrix of type '<class 'numpy.float64'>'
    with 47 stored elements in Compressed Sparse Row format>

I've tried 'diags' from the scipy.sparse module. (I've also tried 'spdiags', but it's just a more fancy version of 'diags', which I don't need.) I've tried it with every combination of [csr or csc format], [original or transposed vector] and [.todense() or .toarray()], but I keep getting the error:

ValueError: Different number of diagonals and offsets.

With sparse.diags the default offset is 0, and what I'm trying to do is to only put numbers on the main diagonal (which is the default), so getting this error means it's not working as I want it to.

Here are examples of the original and transposed vector with .todense() and .toarray() respectively:

x_f_original.todense()
Out[72]: 
matrix([[  0.00000000e+00],
        [  0.00000000e+00],
        [  0.00000000e+00],
        ..., 
        [  0.00000000e+00],
        [  1.03332178e-17],
        [  0.00000000e+00]])

x_f_transposed.toarray()
Out[83]: 
array([[  0.00000000e+00,   0.00000000e+00,   0.00000000e+00, ...,
          0.00000000e+00,   1.03332178e-17,   0.00000000e+00]])

The following code works, but takes about 15 seconds to run:

x_f_diag = sparse.csc_matrix(np.diagflat(x_f.todense()))

Does anyone have any ideas of how to make it more efficient or just a better way to do this?

[Disclaimer]

This is my first question here. I hope I did it right and apologise for anything that's unclear.


Solution

  • In [106]: x_f = sparse.random(1000,1, .1, 'csr')
    In [107]: x_f
    Out[107]: 
    <1000x1 sparse matrix of type '<class 'numpy.float64'>'
        with 100 stored elements in Compressed Sparse Row format>
    

    I can use it in sparse.diags if turn it into a 1d dense array.

    In [108]: M1=sparse.diags(x_f.A.ravel()).tocsr()
    In [109]: M1
    Out[109]: 
    <1000x1000 sparse matrix of type '<class 'numpy.float64'>'
        with 100 stored elements in Compressed Sparse Row format>
    

    Or I can make it a (1,1000) matrix, and use a list as the offset:

    In [110]: M2=sparse.diags(x_f.T.A,[0]).tocsr()
    In [111]: M2
    Out[111]: 
    <1000x1000 sparse matrix of type '<class 'numpy.float64'>'
        with 100 stored elements in Compressed Sparse Row format>
    

    diags takes a dense diagonal, not a sparse. This is stored as is, so I have use the further .tocsr to remove 0s etc.

    In [113]: sparse.diags(x_f.T.A,[0])
    Out[113]: 
    <1000x1000 sparse matrix of type '<class 'numpy.float64'>'
        with 1000 stored elements (1 diagonals) in DIAgonal format>
    

    So either way I am matching the shape of diagonal with the number of offsets (scalar or 1).

    A direct mapping to csr (or csc) is probably faster.

    With this column shape, the indices attribute doesn't tell us anything.

    In [125]: x_f.indices
    Out[125]: 
    array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...0, 0, 0], dtype=int32)
    

    but transform that to csc (this maps the indptr onto indices)

    In [126]: x_f.tocsc().indices
    Out[126]: 
    array([  2,  15,  26,  32,  47,  56,  75,  82,  96,  99, 126, 133, 136,
           141, 145, 149, ... 960, 976], dtype=int32)
    In [127]: idx=x_f.tocsc().indices
    
    In [128]: M3 = sparse.csr_matrix((x_f.data, (idx, idx)),(1000,1000))
    In [129]: M3
    Out[129]: 
    <1000x1000 sparse matrix of type '<class 'numpy.float64'>'
        with 100 stored elements in Compressed Sparse Row format>