I'm calculating the eigenvalues of a dense non symmetric matrix A. For that purpose I use xGEHRD and xHSEQR Lapack routines in order to calculate first the upper Hessenberg form of A and then to calculate the only the eigenvalues of the obtained matrix.
Both routines require the parameter LWORK and both provide a mechanism to calculate its optimal value. I believe that this parameter is related to an internal blocking of buffer technique, but I don't know how it is determined.
Using the query mechanism to obtain the optimal value of LWORK the workflow should be like:
int LWORK = -1;
float* OPT_LWORK = (float*) malloc( sizeof(float));
sgehrd_ (..., OPT_LWORK ,&LWORK, ...) // query optimal value for sgehrd
LWORK = (int) OPT_WORK
float* WORK = (float*) malloc( (int) sizeof(float) * LWORK );
sgehrd_ (..., WORK ,&LWORK, ...) // calculate Hessenberg
int LWORK = -1;
shseqr_ (..., OPT_LWORK ,&LWORK, ...) // query optimal value for shseqr
LWORK = (int) OPT_WORK
float* WORK = // possibly realloc with the new LWORK value
shseqr_ (..., WORK ,&LWORK, ...) // finally obtain eigenvalues
I've done some testing, obtaining always the same optimal values for the dimension of WORK array. If the values were the same, I could simplify my code a lot (no need for realloc and only one call to determine the value of LWORK, less error checking...).
My question is, for the same matrix and the same values of ILO and IHI, can I assume that the values are going to be equal the both routines?
Looking at sgehrd.f, it seems that the optimal size of WORK
for routine sgehrd()
is N*NB
, where NB
is
NB = MIN( NBMAX, ILAENV( 1, 'SGEHRD', ' ', N, ILO, IHI, -1 ) )
where NBMAX=64
. Hence, the optimal LWORK
depends on N
, ILO
and IHI
.
Looking at shseqr.f, the computation of the optimal length LWORK
is more complex : the routine slaqr0()
is called... But the documentation in the file states :
If LWORK = -1, then SHSEQR does a workspace query. In this case, SHSEQR checks the input parameters and estimates the optimal workspace size for the given values of N, ILO and IHI. The estimate is returned in WORK(1). No error message related to LWORK is issued by XERBLA. Neither H nor Z are accessed.
The optimal length of WORK
may be different for sgehrd()
and shseqr()
. Here is an example :
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
extern void sgehrd_(int *n,int* ilo, int* ihi, float* a,int* lda,float* tau,float* work, int* lwork,int* info);
extern void shseqr_(char* job,char* compz,int *n,int* ilo, int* ihi,float* h,int* ldh,float* wr,float* wi,float* z,int* ldz,float* work, int* lwork,int* info);
int main()
{
int n=10;
int ilo=n;
int ihi=n;
float* a=malloc(sizeof(float)*n*n);
int lda=n;
float* tau=malloc(sizeof(float)*(n-1));
int info;
char job='S';
char compz='I';
float* h=malloc(sizeof(float)*n*n);
int ldh=n;
float* wr=malloc(sizeof(float)*(n));
float* wi=malloc(sizeof(float)*(n));
float* z=malloc(sizeof(float)*n*n);
int ldz=n;
int LWORK = -1;
float* OPT_LWORK =(float*) malloc( sizeof(float));
sgehrd_ (&n,&ilo, &ihi, a,&lda,tau,OPT_LWORK ,&LWORK,&info); // query optimal value for sgehrd
if(info!=0){printf("sgehrd lwork=-1 : failed\n");}
LWORK = (int) OPT_LWORK[0];
printf("sgehrd,length of optimal work : %d \n",LWORK);
float* WORK = (float*) malloc( (int) sizeof(float) * LWORK );
sgehrd_ (&n,&ilo, &ihi, a,&lda,tau,WORK ,&LWORK,&info);// calculate Hessenberg
if(info!=0){printf("sgehrd execution : failed\n");}
LWORK = -1;
shseqr_ (&job,&compz,&n,&ilo, &ihi, h,&ldh, wr, wi,z,&ldz, OPT_LWORK ,&LWORK, &info); // query optimal value for shseqr
if(info!=0){printf("shgeqr lwork=-1 : failed\n");}
LWORK = (int) OPT_LWORK[0];
printf("shseqr,length of optimal work : %d \n",LWORK);
WORK = realloc(WORK,(int) sizeof(float) * LWORK );// possibly realloc with the new LWORK value
shseqr_ (&job,&compz,&n,&ilo, &ihi, h,&ldh, wr, wi,z,&ldz, WORK ,&LWORK, &info); // finally obtain eigenvalues
if(info!=0){printf("shgeqr execution : failed\n");}
free(OPT_LWORK);
free(WORK);
free(a);
free(tau);
free(h);
free(wr);
free(wi);
free(z);
}
Compile by gcc main.c -o main -llapack -lblas
My output is :
sgehrd,length of optimal work : 320
shseqr,length of optimal work : 10