Search code examples
pythonarrayssparse-matrix

Python: Convert Sparse Matrix to Array using a For loop


Using pandas=1.1.5. I created a very large sparse matrix using Bag to Word. I want to convert the sparse matrix to array. But I get
MemoryError: Unable to allocate 36.6 GiB for an array with shape (17799, 275656) and data type int64

I don't have admin right to increase the memory in Advanced system settings. So I would like to use a FOR loop to convert the sparse matrix to array. Or is there a better way? Pls assist. Thank you

vector1 = CountVectorizer(ngram_range=(1,2))  
vector1.fit_transform(text).toarray()

Spare Matrix
(0, 81346) 1
(0, 89381) 1
(0, 120631) 1
(0, 69446) 1
(0, 8579) 1
(0, 8531) 1
.
.
.
(17798, 72613) 1
(17798, 116023) 1
(17798, 25859) 1
(17798, 206370) 1
(17798, 153517) 1
(17798, 26090) 1


Solution

  • You can try:

    NUM_SPLIT = 2
    
    arr = vector1.fit_transform(text).astype(np.int8)
    
    # Split sparse matrix into NUM_SPLIT small ones
    r = range(0, 1+arr.shape[0], arr.shape[0]//NUM_SPLIT)
    
    lst = [arr[i:j] for i, j in zip(r, r[1:])]
    

    Output:

    >>> arr
    <4x22 sparse matrix of type '<class 'numpy.int8'>'
        with 39 stored elements in Compressed Sparse Row format>
    
    >>> lst
    [<2x22 sparse matrix of type '<class 'numpy.int8'>'
        with 19 stored elements in Compressed Sparse Row format>,
     <2x22 sparse matrix of type '<class 'numpy.int8'>'
        with 20 stored elements in Compressed Sparse Row format>]