Search code examples
pythonpython-2.7numpyscipysparse-matrix

Effectively change dimension of scipy.spare.csr_matrix


I have a function that takes a csr_matrix and does some calculations on it.

The behavior of these calculation requires the shape of this matrix to be specific (say NxM).

The input I send has less columns and the exact number of rows.

(e.g. it has shape=(A,B) where A < N and B == M)

For example: I have the object x

>>>x = csr_matrix([[1,2],[1,2]])
>>>x
(0, 0)  1
(0, 1)  2
(1, 0)  1
(1, 1)  2
>>>x.shape
(2, 2)

And a function f:

def f(csr_mat):
    """csr_mat.shape should be (2,3)"""

Then I want to do something on x, so it will become y:

>>>y = csr_matrix([[1,2,0],[1,2,0]])
>>>y
(0, 0)  1
(0, 1)  2
(1, 0)  1
(1, 1)  2
>>>y.shape
(2, 3)

In this example, x and y has the same none-zero values, but y has different shape. What I want is to efficiently 'extend' x to a new dimension, filling new columns with zeros. Namely, given x and new_shape=(2,3), it should return y.
I already tried reshape:

x.reshape((2,3))

But then I got:

NotImplementedError

My second option was just to create new csr_matrix with different shape:

z = csr_matrix(x,shape=(3,3))

But this fails as well:

NotImplementedError: Reshaping not implemented for csr_matrix.

EDIT: using csc_matrix has brought the same errors.

Any Ideas?

Thanks


Solution

  • In the CSR format, the underlying data, indices, and indptr arrays for your desired y are identical to those of your x matrix. You can pass those to the csr_matrix constructor with a new shape:

    y = csr_matrix((x.data, x.indices, x.indptr), shape=(2, 3))
    

    Note that the constructor defaults to copy=False, so this will share the data, indices, and indptr between x and y. Some operations on y will be reflected in x. You can pass copy=True to make x and y independent of each other.

    If you want to poke at the undocumented internals of csr_matrix, you can set the internal _shape attribute to make the x array have the shape you want:

    x._shape = (2, 3)
    

    There isn't really an advantage to doing it this way.