Search code examples
pythonnumpyscipylinear-algebrasparse-matrix

A better way to check semi-definite for symmetric matrix stored in scipy sparse matrix?


I have a very large symmetric matrix to store and manipulate in RAM (about40,000 * 40,000), so I use scispy.sparse format to store half of it, below is my code
import numpy as np
from scipy.sparse import coo_matrix
def to_sparse(W):
    tmp = np.tril(W)
    del W
    return coo_matrix(tmp)
Then I want to calculate the laplacian matrix for W(the symmetric one), according to the formula L = D - W, (where D is the diagonal matrix whose diagonals are sums over either cols or rows of W), according to definition of L, I need to check whether it's positive semi-definite(PSD) or not. But it seems like calculating the eigenvalues of the 'L' stored in sparse format are not equivalent to the original problem, so I have to convert L to dense format, and then check? My question: is there a better way to achieve this without converting L back to dense matrix?

Below is my code to generate L and check PSD:

from scipy.sparse import diags
from numpy import linalg
def gen_D(W):
    # W is sparse matrix
    tmp = W.sum(axis = 0)
    tmp2 = W.sum(axis = 1).T - W.diagonal()
    return np.squeeze(np.asarray(tmp + tmp2))
def gen_laplacian(W):
    D = gen_D(W)
    return diags(D, 0).tocoo() - W
def check_PSD(L):
    origin_L = L.todense() + L.T.todense() - np.diag(L.diagonal())
    E, V = linalg.eigh(origin_L)
    return np.all(E > -1e-10)

I'm so sorry that I didn't check the code sample before uploading it, and now I've fixed the bugs, and below is a example:

W = np.random.random_integers(1, 100, size = (100, 100))
W = .5 * (W + W.T)
W = to_sparse(W)
L = gen_laplacian(W)

Solution

  • Is the original array already sparse (plenty of zeros), or are those just a product of tril? If the later, you might not be saving space or time by using sparse code. For example

    def gen1(W):
        tmp = np.tril(W)
        d = tmp.sum(0)+tmp.sum(1)-tmp.diagonal()
        return np.diag(d) - tmp
    

    is 8x faster than gen_laplacian(to_sparse(W)) for this 100x100 test.

    def check(L):
        oL = L+L.T-np.diag(L.diagonal())
        E, V = linalg.eigh(oL)
        return np.all(E > -1e-10)
    

    Even check(L.A) is 2x faster.

    But do you need to construct orginal_L? It looks like linalg.eigh(L.A) produces the same thing, since the calculation is done with the lower triangular part. Not that constructing orginal_L takes much time.

    But the crux of the testing question is whether a sparse.linalg function like eigsh produces something equivalent to eigh.

    I don't know enough about those eigh functions to help - without digging further into the documentation.