I am trying to do an element-wise multiplication for two large sparse matrices. Both are of size around (400K X 500K), with around 100M elements.
However, they might not have non-zero elements in the same positions, and they might not have the same number of non-zero elements. In either situation, Im okay with multiplying the non-zero value of one matrix and the zero value in the other matrix to zero.
I keep running out of memory (8GB) in every approach, which doesnt make much sense. I shouldnt be. These are what I've tried.
A and B are sparse matrices (Ive tried with COO and CSC formats).
# I have loaded sparse matrices A and B, and have a file opened in write mode
row,col = A.nonzero()
index = zip(row,col)
del row,col
for i,j in index :
# Approach 1
A[i,j] *= B[i,j]
# Approach 2
someopenfile.write(' '.join([str(i),str(j),str(A[j,j]*B[i,j]),'\n']))
# Approach 3
if B[i,j] != 0 :
A[i,j] = A[i,j]*B[i,j] # or, I wrote it to a file instead
# like in approach 2
If I comment out the for loop, I see that I use almost 3.5GB of memory. But the moment I use the loop, whether Im writing the products to a file or back to a matrix, the memory usage shoots up to the full memory, causing me to stop the execution, or the system hangs. How can I do this operation without consuming so much memory?
I suspect that your sparse matrices are becoming non sparse when you perform the operation have you tried just:
A.multiply(B)
As I suspect that it will be better optimised than anything that you can easily do.
If A is not already the correct type of sparse matrix you might need:
A = A.tocsr()
# May also need
# B = B.tocsr()
A = A.multiply(B)