Search code examples
pythonnumpyscipyeigenvectorarpack

All eigenvectors of large sparse matrices are zero


I have a 50,000 by 50,000 dense matrix or larger. If I use the numpy or scipy- packages the entries of all my eigenvectors are 0. If I use scipy.sparse to calculate just 1000-8000 eigenvectors, I get the right eigenvectos. But I need all of them.

Could this be a memory problem? Or what is the reason for such a problem? Can I use LAPACK or ARPACK to calculate the right eigenvectors?

Please note my matrices are representations of networkx graphs and therefore sparse matrices. I transform them to dense matrices for using numpy.linalg, otherwise I work with scipy.sparse.linalg.


Solution

  • The issue regarding numpy.linalg.eig() and scipy.linalg.eig() may be a memory error related to the size of the matrix: a 50000x50000 double precision dense matrix occupies 18Go of memory.

    numpy.linalg.eig() and scipy.linalg.eig() rely on the routine DGEEV() of LAPACK. LAPACK DGEEV() and DGEEVX() makes use of the QR algorithm to compute all eigenvalues and eigenvectors of a dense matrix. First, the matrix is reduced to to upper Hessenberg form using dgehrd(), then QR iterations are performed in dhseqr(). In DGEEVX(), the matrix is first balanced and scaled.

    On the other hand,scipy.sparse.linalg.eigs() and scipy.sparse.linalg.eigsh() rely on ARPACK functions implementing the Implicitly Restarted Arnoldi Method and Implicitly Restarted Lanczos Method on sparse matrix. Both are improvements of the power method and are very efficient at computing the largest eigenvalues/eigenvectors with improved accuracy. If Ax=b can be rapidly solved, these iterative methods are also very efficent at finding the smallest eigenvalues/eigenvectors, or eigenvalues/eigenvectors near a given value.

    The difference between these methods is explained in Lloyd N. Trefethen and David Bau, NUMERICAL LINEAR ALGEBRA, Lecture 33. The Arnoldi Iteration.

    ... whereas the Arnoldi iteration is based upon the QR factorization (33.7) of the matrix ,whose columns are b, A b, ... ,A^{n-1} b, simultaneous iteration and the QR algorithm are based upon the QR factorization (28.16) of the matrix whose columns are A^n e_1 ... , A^n e_n .

    From a practical viewpoint, the Arnoldi iteration is always applied to retreive a limited number of eigenvectors significantly lower than the size of the matrix. Nevertheless, the argument sigma of scipy.sparse.linalg.eigs() or scipy.sparse.linalg.eigsh() enables finding interior eigenvalues and eigenvectors near sigma. Hence, it is possible to call scipy.sparse.linalg.eigsh() multiple times using different sigma. If the eigenvalues all feature a limited multiplicity, all eigenvectors can be recovered. Potentiel duplicates are to be avoided by separating eigenvalues and tracking their multiplicity.

    The basic call using sigma writes:

    sparsevalue, sparsevector = sp.sparse.linalg.eigsh(M.copy(), k = kkeig,sigma=sig, which='LM', mode='normal')
    

    If the matrix is Hermitian semi Here is a sample code, based on your previous deleted post, that computes all eigenvalues of a positive semi-definite sparse matrix. As the eigenvalues are all real and positive, sigma is progressively increased to find all eigenvalues:

    import numpy as np
    import networkx as nx
    import scipy as sp
    
    
    
    
    def tolist(sparsevalue, keeplast=False):
        listofeigval=[]
        listofmultiplicity=[]
    
        curreig=sparsevalue[0]-1.1*np.abs(sparsevalue[0])
        for i in range(len(sparsevalue)):
               #print curreig, sparsevalue[i], len(listofeigval)
               if np.abs(curreig-sparsevalue[i])<1e-11*(np.abs(curreig)+np.abs(sparsevalue[i])):
                    listofmultiplicity[-1]=listofmultiplicity[-1]+1
               else :
                    curreig=sparsevalue[i]
                    listofeigval.append(curreig)
                    listofmultiplicity.append(1)
    
        if keeplast==False:
            #the last one is not sure regarding multiplicity:
            listofeigval.pop()
            listofmultiplicity.pop()
    
        return listofeigval,listofmultiplicity
    
    def test():
        N_1 = 2000
        R_1 = 0.1
        k = 0
        iterations = 1
        while k < iterations:
          G = nx.random_geometric_graph(N_1, R_1)
          if nx.is_connected(G) == True:
              print 'got connected network'
              k = k+1
              M=nx.laplacian_matrix(G)      #M is here a sparse matrix
              M = M.astype(float)
              #M[0,0]=M[0,0]+1.  # this makes the laplacian_matrix positive definite.
              #sparsevalue, sparsevector = sp.sparse.linalg.eigsh(M, k = N_1-2)
              kkeig=400
              sparsevalue, sparsevector = sp.sparse.linalg.eigsh(M.copy(), k = kkeig, which='SM')
              print sparsevalue
    
              listofeigval=[]
              listofmultiplicity=[]
              listofeigval,listofmultiplicity=tolist(sparsevalue)
              print len(listofeigval), len(listofmultiplicity)
    
              nbeigfound=0
              for mul in listofmultiplicity:
                  nbeigfound=nbeigfound+mul
              keepgoing=True
              while( nbeigfound<N_1):
                   print '(',nbeigfound,'/',N_1,')  is ', listofeigval[-1]
                   meanspacing=0.
                   meanspacingnb=0.
    
                   for i in range(10):
                         meanspacingnb=meanspacingnb+listofmultiplicity[len(listofeigval)-i-1]
                   meanspacing=(listofeigval[-1]-listofeigval[len(listofeigval)-10])/meanspacingnb
                   sig=listofeigval[-1]+0.1*kkeig*meanspacing
    
                   keeplast=False
                   if nbeigfound<N_1-0.5*kkeig:
                       sparsevalue, sparsevector = sp.sparse.linalg.eigsh(M.copy(), k = kkeig,sigma=sig, which='LM', mode='normal')
                   else:
                       keeplast=True
                       sparsevalue, sparsevector = sp.sparse.linalg.eigsh(M.copy(), k = kkeig, which='LM')
                   listofneweigval,listofnewmultiplicity=tolist(sparsevalue,keeplast)
                   #need to retreive the starting point
                   index=len(listofeigval)-2*kkeig/10
                   if listofneweigval[1]>listofeigval[index]:
                       while(np.abs(listofneweigval[1]-listofeigval[index])>1e-10*(np.abs(listofneweigval[1])+np.abs(listofeigval[index]))):
                           index=index+1
                   else:
                       while(np.abs(listofneweigval[1]-listofeigval[index])>1e-10*(np.abs(listofneweigval[1])+np.abs(listofeigval[index]))):
                           index=index-1
                   #The first matching eigenvalue is found.
                   #zipping the list and checking if it works.
                   i=1
                   while index<len(listofeigval) and i<len(listofneweigval) :
                        if (np.abs(listofneweigval[i]-listofeigval[index])>1e-10*(np.abs(listofneweigval[i])+np.abs(listofeigval[index]))):
                             print 'failed after ', index, ' different eigenvalues', ': wrong eigenvalue'
                             return listofeigval[0:index], listofmultiplicity[0:index], 1
                        if listofmultiplicity[index] != listofnewmultiplicity[i] :
                             print 'failed after ', index, ' different eigenvalues', ': wrong multiplicity'
                             return listofeigval[0:index], listofmultiplicity[0:index], 2
                        index=index+1
                        i=i+1
                   #adding the remaining eigenvalues.
                   while i<len(listofneweigval) :
                        listofeigval.append(listofneweigval[i])
                        listofmultiplicity.append(listofnewmultiplicity[i])
                        nbeigfound=nbeigfound+listofnewmultiplicity[i]
                        i=i+1
              print 'number of eigenvalues: ', nbeigfound
              nbl=0
              for i in range(len(listofeigval)):
                   print 'eigenvalue ',i,' (',nbl,'/',N_1,')  is %14.8f'% listofeigval[i], ' with multiplicity', listofmultiplicity[i]
                   nbl=nbl+listofmultiplicity[i]
    
              return listofeigval, listofmultiplicity, 0
    
    
              #sig=39.1
              #sparsevalue, sparsevector = sp.sparse.linalg.eigsh(M.copy(), k = 100,sigma=sig, which='LM', mode='normal')
              #print sparsevalue
    
    test()
    

    For Hermitian matrices featuring real positive and negative eigenvalues, both positive and negative values of sigma are to be explored. If the matrix is not Hermitian, the eigenvalues may not be real and values of sigma on the complex plane are to be chosen. Searching first for the magnitude of the largest eigenvalue of A limits the area to a disk.

    The proposed method is very slow and may not always work. It worked once for a 20000x20000 matrix, using 1Go of memory.