Search code examples
clinear-algebralapackblas

BLAS matrix-vector multiplication vs vector-matrix multiplication. One works; the other fails


I succeed in matrix-vector multiplication when working with cgemv a BLAS lvl 2 function in Lapack, but when I try the transpose, I get a wrong answer. Can you instruct me in my error? (I'm actually using the C wrapper, not FORTRAN.)

I'm attempting

| 4+i    3 |   | 3+2i |          | 4+i    3 |^T   | 3+2i |
| 14+3i  2 | * | 2    |  (AND)   | 14+3i  2 |   * | 2    |

To be clear, the first one succeeds. The second one gives incorrect output.

/* config variables */
char normal = 'N';
char transpose = 'T';
integer m = 2;
complex alpha = {r:1,i:0};
complex beta = {r:0,i:0};
integer one = 1;
/* data buffers */
complex a[4] = {(complex){r:4,  i:1},(complex){r:14, i:3},(complex){r:3,  i:0},(complex){r:6,  i:0}};
complex x[2] = {(complex){r:3, i:2},(complex){r:2, i:0}};
complex y[2];
/* execution */
cgemv_(&normal, &m, &m, &alpha, &a[0], &m, &x[0], &one, &beta, &y[0], &one);
cgemv_(&transpose, &m, &m, &alpha, &a[0], &m, &x[0], &one, &beta, &y[0], &one);

After the first cgemv_ call, y holds 16.0000+11.0000i 48.0000+37.0000i, which MATLAB confirms to be correct.

But after the second cgemv_ call, y holds 38.0000+17.0000i 21.0000+6.0000i, whereas MATLAB says it should be 42.0000-1.0000i 21.0000+6.0000i. I've no idea what could be awry.


Solution

  • Since a 2 vector is multiplied by a 2x2 matrix, performing the operation using a pen and a piece of paper is not too complex. If the transposed matrix is used:

    (4+i)*(3+2i)+(14+3i)*2=38+17i

    Curiously:

    (4-i)*(3+2i)+(14-3i)*2=42-i

    So the output of MATLAB is likely the output obtained by using the complex conjugate transpose. The same output can be produced by BLAS if the TRANS parameter of cgemv_ is set to 'C'.

    Here is a sample code based on ours showing what BLAS really computes for the different values of TRANS. It can be computed by gcc main.c -o main -lblas -Wall.

    #include <stdlib.h>
    #include <stdio.h>
    
    #include <complex.h>
    
    
    extern int cgemv_(char* trans, int * m, int * n, float complex* alpha, float complex* A, int * lda,float complex * x, int* incx, float complex * beta, float complex * y,int* incy);
    
    int main(void) {
    
        /* config variables */
        char normal = 'N';
        char transpose = 'T';
        char ctranspose = 'C';
        int m = 2;
        float complex alpha = 1.0+0.*I;
        float complex beta = 0.0+0.*I;
        int one = 1;
        /* data buffers */
        float complex a[4]= {4+1.*I,14+3.*I,3.+0.*I,6.+0.*I};
        float complex x[2] = {3.+2.*I,2+0.*I};
        float complex y[2];
        /* execution */
    
        float complex ye[2];
        ye[0]=a[0]*x[0]+a[2]*x[1];
        ye[1]=a[1]*x[0]+a[3]*x[1];
    
        cgemv_(&normal, &m, &m, &alpha, &a[0], &m, &x[0], &one, &beta, &y[0], &one);
    
        printf("N\n");
        printf("y[0]=%2.6f + %2.6f I expected %6f + %6f I\n",creal(y[0]),cimag(y[0]),creal(ye[0]),cimag(ye[0])); 
        printf("y[1]=%2.6f + %2.6f I expected %6f + %6f I\n",creal(y[1]),cimag(y[1]),creal(ye[1]),cimag(ye[1])); 
    
        //float complex ye[2];
        ye[0]=a[0]*x[0]+a[1]*x[1];
        ye[1]=a[2]*x[0]+a[3]*x[1];
    
        cgemv_(&transpose, &m, &m, &alpha, &a[0], &m, &x[0], &one, &beta, &y[0], &one);
    
        printf("T\n");
        printf("y[0]=%2.6f + %2.6f I expected %6f + %6f I\n",creal(y[0]),cimag(y[0]),creal(ye[0]),cimag(ye[0])); 
        printf("y[1]=%2.6f + %2.6f I expected %6f + %6f I\n",creal(y[1]),cimag(y[1]),creal(ye[1]),cimag(ye[1]));  
    
        ye[0]=conj(a[0])*x[0]+conj(a[1])*x[1];
        ye[1]=conj(a[2])*x[0]+conj(a[3])*x[1];
    
        cgemv_(&ctranspose, &m, &m, &alpha, &a[0], &m, &x[0], &one, &beta, &y[0], &one);
    
        printf("C\n");
        printf("y[0]=%2.6f + %2.6f I expected %6f + %6f I\n",creal(y[0]),cimag(y[0]),creal(ye[0]),cimag(ye[0])); 
        printf("y[1]=%2.6f + %2.6f I expected %6f + %6f I\n",creal(y[1]),cimag(y[1]),creal(ye[1]),cimag(ye[1]));  
    
    
        return 0;
    }