Search code examples
pythonmatrixscipycsr

VSTACK scipy.sparse.csr.csr_matrix to one single csr matrix


I recently struggled to work with sparse matrices and stacking those to a single matrix. I used to create multiple csr_matrix objects

vec_list = sp.sparse.csr_matrix(my_vec_i) # every vector of shape (1,200)

And after vec_list consisted of around 100 sparse matrices, I used scipy's (NOT numpy's) sp.vstack function to merge all 100 entries to a csr matrix of shape (100, 200).

Now in my current setting (python 3.8) I see a warning that sp.vstack is going to be deprecated, but anyways, not matter if I used numpy's or scipy's vstack functionality, I ended up having an array of shape (100,1) where my 200 columns are regarded as 1 csr_matrix entry in the first and only column.

In my old code snippets I could see, that sp.vstack(vec_list) created a sparse crs matrix of shape (100,200).. Do I miss anything, does anyone have thoughts on this? I am getting slightly desperate to create my stacked sparse matrix.. thanks all

Edit: As you can see below in my comment np.vstack and sp.vstack do not necessarily do the same (in my answer I sad np.vstack twice, but I meant sp.vstack once). I was using the exact solution (copied) and it returned an error at some point, as no stacking took place. In order to use sp.stacking, I stacked non-csr_matrix arrays and then convert this to a csr_matrix. This is not practicable when using huge sets of arrays, but at least I could run through the file without issues. To address the below answer from Tinu, I was not able to solve it this way, as the result looks like the following - when executing the example code:

>>> np.vstack(vec_list).shape
(100, 1)
>>> sp.vstack(vec_list).shape
(100, 200)

Python 3.8.2, Scipy 1.4.1


Solution

  • I was unfortunately not able to see the same result as stated above - using my Python 3.8.3rc1. Copying the code and stacking lead to the following:

    >>> np.vstack(vec_list).shape # (100, 1)
    >>> sp.vstack(vec_list).shape # (100, 1)
    

    What I will do to circumvent my problem: I will stack non-csr_matrix arrays and then convert this to a csr_matrix. thanks anyways!