40,000 * 40,000
), so I use scispy.sparse format to store half of it, below is my code
import numpy as np
from scipy.sparse import coo_matrix
def to_sparse(W):
tmp = np.tril(W)
del W
return coo_matrix(tmp)
Then I want to calculate the laplacian matrix for W
(the symmetric one), according to the formula L = D - W
, (where D
is the diagonal matrix whose diagonals are sums over either cols or rows of W
), according to definition of L
, I need to check whether it's positive semi-definite(PSD) or not.
But it seems like calculating the eigenvalues of the 'L'
stored in sparse format are not equivalent to the original problem, so I have to convert L
to dense format, and then check? My question: is there a better way to achieve this without converting L
back to dense matrix?
Below is my code to generate L
and check PSD:
from scipy.sparse import diags
from numpy import linalg
def gen_D(W):
# W is sparse matrix
tmp = W.sum(axis = 0)
tmp2 = W.sum(axis = 1).T - W.diagonal()
return np.squeeze(np.asarray(tmp + tmp2))
def gen_laplacian(W):
D = gen_D(W)
return diags(D, 0).tocoo() - W
def check_PSD(L):
origin_L = L.todense() + L.T.todense() - np.diag(L.diagonal())
E, V = linalg.eigh(origin_L)
return np.all(E > -1e-10)
I'm so sorry that I didn't check the code sample before uploading it, and now I've fixed the bugs, and below is a example:
W = np.random.random_integers(1, 100, size = (100, 100))
W = .5 * (W + W.T)
W = to_sparse(W)
L = gen_laplacian(W)
Is the original array already sparse (plenty of zeros), or are those just a product of tril
? If the later, you might not be saving space or time by using sparse code. For example
def gen1(W):
tmp = np.tril(W)
d = tmp.sum(0)+tmp.sum(1)-tmp.diagonal()
return np.diag(d) - tmp
is 8x faster than gen_laplacian(to_sparse(W))
for this 100x100 test.
def check(L):
oL = L+L.T-np.diag(L.diagonal())
E, V = linalg.eigh(oL)
return np.all(E > -1e-10)
Even check(L.A)
is 2x faster.
But do you need to construct orginal_L
? It looks like linalg.eigh(L.A)
produces the same thing, since the calculation is done with the lower triangular part
. Not that constructing orginal_L
takes much time.
But the crux of the testing question is whether a sparse.linalg
function like eigsh
produces something equivalent to eigh
.
I don't know enough about those eigh
functions to help - without digging further into the documentation.