Using pandas=1.1.5. I created a very large sparse matrix using Bag to Word. I want to convert the sparse matrix to array. But I get
MemoryError: Unable to allocate 36.6 GiB for an array with shape (17799, 275656) and data type int64
I don't have admin right to increase the memory in Advanced system settings. So I would like to use a FOR loop to convert the sparse matrix to array. Or is there a better way? Pls assist. Thank you
vector1 = CountVectorizer(ngram_range=(1,2))
vector1.fit_transform(text).toarray()
Spare Matrix
(0, 81346) 1
(0, 89381) 1
(0, 120631) 1
(0, 69446) 1
(0, 8579) 1
(0, 8531) 1
.
.
.
(17798, 72613) 1
(17798, 116023) 1
(17798, 25859) 1
(17798, 206370) 1
(17798, 153517) 1
(17798, 26090) 1
You can try:
NUM_SPLIT = 2
arr = vector1.fit_transform(text).astype(np.int8)
# Split sparse matrix into NUM_SPLIT small ones
r = range(0, 1+arr.shape[0], arr.shape[0]//NUM_SPLIT)
lst = [arr[i:j] for i, j in zip(r, r[1:])]
Output:
>>> arr
<4x22 sparse matrix of type '<class 'numpy.int8'>'
with 39 stored elements in Compressed Sparse Row format>
>>> lst
[<2x22 sparse matrix of type '<class 'numpy.int8'>'
with 19 stored elements in Compressed Sparse Row format>,
<2x22 sparse matrix of type '<class 'numpy.int8'>'
with 20 stored elements in Compressed Sparse Row format>]