I have a large matrix (236680*236680), and my pc does not have sufficient memory to read in the complete matrix so that I am thinking the Scipy sparse matrix. My goal is to multiply a generated matrix (not sparse) by np.eye(the number of observation)-np.ones(the number of observation)/the number of observation with a sparse matrix.
In Scipy, I use the following code, but the computation is still huge. My questions include:
-
from scipy.sparse import lil_matrix
fline=5
nn=1/fline
M=lil_matrix((fline,fline))
M.setdiag(values=1-nn,k=0)
for i in range(fline)[1:]:
M.setdiag(values=0-nn,k=i)
M.setdiag(values=0-nn,k=-i)
#the first matrix is:
array([[ 0.8, -0.2, -0.2, -0.2, -0.2],
[-0.2, 0.8, -0.2, -0.2, -0.2],
[-0.2, -0.2, 0.8, -0.2, -0.2],
[-0.2, -0.2, -0.2, 0.8, -0.2],
[-0.2, -0.2, -0.2, -0.2, 0.8]])
#the second matrix is:
array([[0., 0., 0., 1., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 1., 0.],
[0., 0., 0., 0., 0.],
[1., 0., 1., 0., 0.]])
a2=M.dot(B)
#the final expected results
array([[-0.2, 0. , -0.2, 0.6, 0. ],
[-0.2, 0. , -0.2, -0.4, 0. ],
[-0.2, 0. , -0.2, 0.6, 0. ],
[-0.2, 0. , -0.2, -0.4, 0. ],
[ 0.8, 0. , 0.8, -0.4, 0. ]])
Updated: is there any way to improve the speed of the cross product? Numpy dot and Scipy sparse dot functions are tested.
Scipy sparse matrix is used, since one of the matrics is a sparse matrix and the cross product function in the sparse matrix is the fastest between Numpy and Scipy.
For the first question, @Tai's answer is the foundation, but I use numpy.full function (a little bit faster).
For the second question, dividing the whole matrix and save smaller computed matrices in files are used.
from scipy import sparse
from scipy.sparse import vstack
import h5sparse
import numpy as num
fline=236680
nn=1/fline; dd=1-nn; off=0-nn
div=int(fline/(61*10))
for i in range(61*10):
divM= num.full((fline, div), off) + sparse.identity(fline,format='csc')[:,0+div*i:div+div*i]
vs=[]
for j in range(divM.shape[1]):
divMB=csr_matrix(divM.T[j]).dot(weights)
vs.append(divMB)
divapp=vstack(vs)
if i ==0:
h5f = h5sparse.File("F:/dissertation/dallastest/temp/tt1.h5")
h5f.create_dataset('sparse/matrix', data=divapp, chunks=(389,),maxshape=(None,))
else:
h5f['sparse/matrix'].append(divapp)