Using less matrices with BLAS

I'm quite new to BLAS (using OpenBLAS with C++ and VisualStudio)

I know dgemm performs C <- alpha * op(A) * op(B) + beta * C

I was trying to save some allocation doing this: B <- 1 * op(A) * op(B) + 0 * B

In other words, putting the result in the B matrix,

BUT making beta = 0 and repeating B in the position of C, results in a zero answer.

Is there a way to make it right?

The code that I'm using:

double* A = new double [3*3]; //3 rows x 3 columns

A[0] = 8;
A[1] = 3;
A[2] = 4;
A[3] = 1;
A[4] = 5;
A[5] = 9;
A[6] = 6;
A[7] = 7;
A[8] = 2;

double* v = new double[3]; //3 rows x 1 column

v[0] = 3;
v[1] = 5;
v[2] = 2;

double* foo = new double[3]; //3 rows x 1 column

cblas_dgemm(CblasColMajor, CblasNoTrans, CblasNoTrans,
    3, 1, 3,
    1,
    A, 3,
    v, 3,
    0,
    foo, 3); // makes foo = [41 ; 48 ; 61], **right**

cblas_dgemm(CblasColMajor, CblasTrans, CblasTrans,
    3, 1, 3,
    1,
    A, 3,
    v, 3,
    0,
    v, 3); // makes v = [0 ; 0 ; 0], **wrong**

Solution

BLAS dgemm function documentation states that only the C matrix parameter is for both input and output, being overwritten by the operation result. As B is defined just for input, BLAS implementations can assume that it shouldn't be modified.

Setting B and C to the same data pointer could be triggering some error verification on the implementation you're using, returning the zeroed result to indicate that.