Intel MKL LAPACKE_dsyevr on large NxN matrices with N~8*10^4

I am trying to diagonalize (all eigenvalues and eigenvectors) of a matrix NxN with dimension N = 80,000. I use intel MKL 2018 on a machine with ~3TB RAM, and use the intel LAPACKE_dsyevr routine. I attach a code below which creates a matrix of size NxN, and then passes it to the dsyevr routine to find the eigenvalues. While the code works for N till ~ 50000, this fails for case of N ~ 80000, and a memory exception is thrown. This cannot be due to the memory of my machine, since the code happily stays in the routine dsyevr routine for a long time before crashing with a segmentation fault.

Please note that I do need to declare the matrix as a std::vector, due to my main code structure. Since the MKL C routine, needs the matrix as a pointer, I make the pointer point to the first element of the vector.

The code is here

    // this uses relatively robust representations 
    #include<iostream>
    #include<fstream>
    #include<stdlib.h>
    #include<stdio.h>
    #include<vector>
    #include<iterator>
    #include<math.h>
    #include "mkl_lapacke.h"
    #include "mkl.h"
    /* Auxiliary routines prototypes */
    extern void print_matrix( char* desc, MKL_INT m, MKL_INT n, double* a,   MKL_INT lda );
    extern void fileprint_matrix( char* desc, MKL_INT m, MKL_INT n, double* a, MKL_INT lda );

    /* Main program */
    int main() {
    unsigned int k;
    unsigned int i,j;
    MKL_INT N, NSQ, LDA, LDZ, NSELECT, info;    
    MKL_INT il, iu, m;
    double abstol, vl, vu;
    N=80000;
    NSQ=N*N;
    LDA=N;
    LDZ=N;

    /* allocate space for the arrays */
    std::vector<MKL_INT> isuppz(2*N,0); 
    std::vector<double> w(N,0.0);
    std::vector<double> z(NSQ,0.0);
    std::vector<double> a(NSQ,0.0);
    std::cout<<"size of vector "<<a.size()<<std::endl;
    std::cout<<"maximum capacity of vector "<<a.max_size()<<std::endl;
    // make the matrix
    k=0;
    for(size_t pos=0; pos < a.size(); pos++){
      i=k/N; j=k%N; 
      a.at(pos) = 1.0/(i+j+1) + (i+j)*exp(-0.344*(i+j));
      k++;
    }
    double* A = &a[0];
    /* print full matrix */
    //print_matrix("Full matrix", N, N, A, LDA);
    /* Executable statements */
    printf( "LAPACKE_dsyevr (column-major, high-level) RRR Example Program  Results\n" );
    /* Solve eigenproblem */
    abstol=-1;
    MKL_INT* ISUPPZ = &isuppz[0];
    double* W = &w[0];
    double* Z = &z[0];
    info = LAPACKE_dsyevr( LAPACK_COL_MAJOR, 'V', 'A', 'U', N, A, LDA,
                        vl, vu, il, iu, abstol, &m, W, Z, LDZ, ISUPPZ );
    printf("Back from the routine and working properly. Info = %d.  Eval[0]=%e\n",info,W[0]);
    //printf("No of eigenvalues = %d\n",m);
    /* Check for convergence */
   if( info > 0 ) {
       printf( "The algorithm failed to compute eigenvalues.\n" );
       exit( 1 );
   }
   isuppz.clear(); 
   w.clear();
   z.clear();
   a.clear();
   exit( 0 );  
   }

Solution

OK, my friend. What do you expect? Let me start by congratulating you on the interesting problem. I'd love to solve it myself. Having said that:

a is 80000*80000*8 = 47.7GB. This is only to store you matrix. But that's not all. You also have z and some ridiculously small vectors of 80000 doubles. So you start off by allocating 96GB. How much RAM do you have available to your process? But this is not your problem it seems. Your problem is LWORK and with that W. The documentation asks for LWORK to need at least 1.26*N:

LWORK is INTEGER
  The dimension of the array WORK.  LWORK >= max(1,26*N).
  For optimal efficiency, LWORK >= (NB+6)*N,
  where NB is the max of the blocksize for DSYTRD and DORMTR
  returned by ILAENV.

  If LWORK = -1, then a workspace query is assumed; the routine
  only calculates the optimal size of the WORK array, returns
  this value as the first entry of the WORK array, and no error
  message related to LWORK is issued by XERBLA.

But really, what you should do is to get a calculation for LWORK from the algorithms by setting it once to -1 as suggested above. It will give you a proper number for LWORK in the first cell of WORK:

Something like this:

LWORK=-1;
std::vector<double> w(1);
// Workspace estimation
info = LAPACKE_dsyevr( LAPACK_COL_MAJOR, 'V', 'A', 'U', N, A, LDA,
                    vl, vu, il, iu, abstol, &m, W, Z, LDZ, ISUPPZ );
LWORK = std::static_cast<int>(work.front());
try {
   work.resize(LWORK);
} catch (const std::exception& e) {
   std::cerr << "Failed to allocate work space." << std::endl;
   exit 1;
}
// Actual calculation
info = LAPACKE_dsyevr( LAPACK_COL_MAJOR, 'V', 'A', 'U', N, A, LDA,
                    vl, vu, il, iu, abstol, &m, W, Z, LDZ, ISUPPZ );

Completely forgot about one the most important things I wanted to mention: Make sure that you really need double precision. LAPACKE_fsyevr should need have the RAM and be about twice as fast.