I want to multiply matrices by D*W', where W' is a transposed version of W.
While I'll use DGEMM I'l figured out with the help of @IanBush that the LDB in this case should be number of rows of matrix W instead of number of columns. The code for this case is
Call dgemm('n', 't', N1, M1, N1, 1.0_wp, D, N1, W, M1, 0.0_wp, c, n1)
where n1 and m1 are dimensions of my matrices
Dimensions of the matrices:
W = M1*N1
D = N1*N1
As in official documentation it says
LDB is INTEGER
On entry, LDB specifies the first dimension of B as declared
in the calling (sub) program. When TRANSB = 'N' or 'n' then
LDB must be at least max( 1, k ), otherwise LDB must be at
least max( 1, n ).
Why does this happen?
The call to dgemm contains two sets of integers. The first is to define the dimensions of the matrices as described by the maths. Looking at the documentation for BLAS at http://www.netlib.org/lapack/explore-html/d7/d2b/dgemm_8f.html we can see the arguments to the routine defined as (and formatting it in a way that helps me remember how it works):
Subroutine dgemm (TRANSA, TRANSB, M, N, K, ALPHA, A, LDA, &
B, LDB, &
BETA, C, LDC)
The set of integers to define the mathematical dimensions is M, N, K
In this set
C
C
C
So M, N, K
are purely to do with the maths of the problem, nothing else. But we are on a computer, so there is a second issue - how is each of the two dimensional matrices laid out in the computer's memory, which is an inherently one dimensional object? Now Fortran stores its objects in Column Major order. This means that in memory if you have element A( i, j ) the next memory location will hold A( i + 1, j ) if we are not an the end of a column, or A( 1, j + 1 ) if we are. So the matrix is laid out with the 'first index moving fastest'. To define this layout of a 2 dimensional object all we need to tell the program is how many elements of A you need to jump over to go from A( i, j ) to A( i, j + 1 ) - and it is this number which is the 'leading dimension', LDA, LDB, LDC
in the documentation. And thus it is simply the number of rows in the matrix as declared or allocated.
Now let's look at the documentation you quote
LDB is INTEGER
On entry, LDB specifies the first dimension of B as declared
in the calling (sub) program. When TRANSB = 'N' or 'n' then
LDB must be at least max( 1, k ), otherwise LDB must be at
least max( 1, n ).
LDB is the number of rows in the matrix B is it is declared. Thus
So that answers most of it, and if you are always multiplying all of one matrix as declared by all of a second matrix the leading dimension will simply be the dirst dimension of each matrix as declared. But the 'at least' means you can multiply bits of arrays together. Consider the following, and by carefully separating those integers which define the maths, and those which define the memory layout, can you work out why it does what it does? Note that a( 2, 2 ) in the argument list means we "start" the matrix at this element - now think carefully what the leading dimension tells you and you should be able to sort out how this works.
ian@eris:~/work/stack$ cat matmul2.f90
Program sirt
Integer, Parameter :: wp = Kind( 1.0d0 )
Real( wp ), Dimension( 1:5, 1:5 ) :: A
Real( wp ), Dimension( 1:3, 1:3 ) :: B
Real( wp ), Dimension( 1:4, 1:4 ) :: C
Integer :: i
A = 0.0_wp
Do i = 1, 5
A( i, i ) = Real( i, wp )
End Do
Write( *, * ) 'A = '
Do i = 1, Size( A, Dim = 1 )
Write( *, '( 100( f4.1, 1x ) )' ) A( i, : )
End Do
B = 0.0_wp
B( 1, 1 ) = 1.0_wp
B( 3, 2 ) = 1.0_wp
B( 2, 3 ) = 1.0_wp
Write( *, * ) 'B = '
Do i = 1, Size( B, Dim = 1 )
Write( *, '( 100( f4.1, 1x ) )' ) B( i, : )
End Do
! Lazy - should really just initialise only the bits of C that are NOT touched
! by the dgemm
C = 0.0_wp
Call dgemm('N', 'N', 3, 3, 3, 1.0_wp, A( 2, 2 ), Size( A, Dim = 1 ), &
B , Size( B, Dim = 1 ), &
0.0_wp, C , Size( C, Dim = 1 ) )
Write( *, * ) 'C after dgemm'
Write( *, * ) 'B = '
Do i = 1, Size( C, Dim = 1 )
Write( *, '( 100( f4.1, 1x ) )' ) C( i, : )
End Do
End Program sirt
ian@eris:~/work/stack$ gfortran-8 -std=f2008 -fcheck=all -Wall -Wextra -pedantic -O matmul2.f90 -lblas
/usr/bin/ld: warning: libgfortran.so.4, needed by //usr/lib/x86_64-linux-gnu/libopenblas.so.0, may conflict with libgfortran.so.5
ian@eris:~/work/stack$ ./a.out
A =
1.0 0.0 0.0 0.0 0.0
0.0 2.0 0.0 0.0 0.0
0.0 0.0 3.0 0.0 0.0
0.0 0.0 0.0 4.0 0.0
0.0 0.0 0.0 0.0 5.0
B =
1.0 0.0 0.0
0.0 0.0 1.0
0.0 1.0 0.0
C after dgemm
B =
2.0 0.0 0.0 0.0
0.0 0.0 3.0 0.0
0.0 4.0 0.0 0.0
0.0 0.0 0.0 0.0
This use for multiplying a part of two larger matrices together is surprisingly common, especially in blocked linear algebra algorithms, and the separation of the mathematical and layout dimensions allows you to do it