Search code examples
c++performanceiterationlapack

LAPACK function gets slower after first iteration


I am implementing an iterative algorithm that uses LAPACK for PSD Projections (doesn't really matter, the point is I'm calling this function over and over):

void useLAPACK(vector<double>& x, int N){

    /* Locals */
    int n = N, il, iu, m, lda = N, ldz = N, info, lwork, liwork;
    double abstol;
    double vl,vu;
    int iwkopt;
    int* iwork;
    double wkopt;
    double* work;
    /* Local arrays */
    int isuppz[N];
    double w[N], z[N*N];

    /* Negative abstol means using the default value */
    abstol = -1.0;
    /* Set il, iu to compute NSELECT smallest eigenvalues */
    vl = 0;
    vu = 1.79769e+308;
    /* Query and allocate the optimal workspace */
    lwork = -1;
    liwork = -1;
    dsyevr_( (char*)"Vectors", (char*)"V", (char*)"Upper", &n, &x[0], &lda, &vl, &vu, &il, &iu,
             &abstol, &m, w, z, &ldz, isuppz, &wkopt, &lwork, &iwkopt, &liwork,
             &info );
    lwork = (int)wkopt;
    work = (double*)malloc( lwork*sizeof(double) );
    liwork = iwkopt;
    iwork = (int*)malloc( liwork*sizeof(int) );
    /* Solve eigenproblem */
    dsyevr_( (char*)"Vectors", (char*)"V", (char*)"Upper", &n, &x[0], &lda, &vl, &vu, &il, &iu,
             &abstol, &m, w, z, &ldz, isuppz, work, &lwork, iwork, &liwork,
             &info );
    /* Check for convergence */
    if( info > 0 ) {
        printf( "The dsyevr (useLAPACK) failed to compute eigenvalues.\n" );
        exit( 1 );
    }
    /* Print the number of eigenvalues found */
    //printf( "\n The total number of eigenvalues found:%2i\n", m );
    //print_matrix( "Selected eigenvalues", 1, m, w, 1 );
    //print_matrix( "Selected eigenvectors (stored columnwise)", n, m, z, ldz );
    //Eigenvectors are returned as stacked columns (in total m)
    //Outer sum calculation is fastest.
    for(int i = 0; i < N*N; ++i) x[i] = 0;
    double lambda;
    double vrow1,vrow2;
    for(int col = 0; col < m; ++col) {
        lambda = w[col];
        for (int row1 = 0; row1 < N; ++row1) {
            vrow1 = z[N*col+row1];
            for(int row2 = 0; row2 < N; ++row2){
                vrow2 = z[N*col+row2];
                x[row1*N+row2]  += lambda*vrow1*vrow2;
            }
        }
    }
    free( (void*)iwork );
    free( (void*)work );
}

Now my time measurements show that the first call takes about 4ms, but then it increases to 100ms. Is there a good explanation for this in this code? x is the same vector every time.


Solution

  • I think I have figured out the problem. My algorithm starts with the zero matrix and afterwards the number of positive eigenvalues are more or less half positive half negative. dsyevr only calculates positive eigenvalues and corresponding eigenvectors with those arguments. I suppose if all are zero it doesn't really have to calculate any eigenvectors, which makes the algorithm much faster. Thanks for all the answers and sorry about the missing information.