Search code examples
matrixfortranhpclapackblas

BLAS LDB using DGEMM


I want to multiply matrices by D*W', where W' is a transposed version of W.

While I'll use DGEMM I'l figured out with the help of @IanBush that the LDB in this case should be number of rows of matrix W instead of number of columns. The code for this case is

  Call dgemm('n', 't', N1, M1, N1, 1.0_wp, D, N1, W, M1, 0.0_wp, c, n1)

where n1 and m1 are dimensions of my matrices

Dimensions of the matrices:
W = M1*N1
D = N1*N1

As in official documentation it says

      LDB is INTEGER
       On entry, LDB specifies the first dimension of B as declared
       in the calling (sub) program. When  TRANSB = 'N' or 'n' then
       LDB must be at least  max( 1, k ), otherwise  LDB must be at
       least  max( 1, n ).

Why does this happen?


Solution

  • The call to dgemm contains two sets of integers. The first is to define the dimensions of the matrices as described by the maths. Looking at the documentation for BLAS at http://www.netlib.org/lapack/explore-html/d7/d2b/dgemm_8f.html we can see the arguments to the routine defined as (and formatting it in a way that helps me remember how it works):

             Subroutine dgemm (TRANSA, TRANSB, M, N, K, ALPHA, A, LDA, &
                                                               B, LDB, & 
                                                         BETA, C, LDC)
    

    The set of integers to define the mathematical dimensions is M, N, K In this set

    • M is the first mathematical dimension of the result matrix, C
    • N is the second mathematical dimension of the result matrix C
    • K is the length of the dot product required to produce an individual element of the result matrix C

    So M, N, K are purely to do with the maths of the problem, nothing else. But we are on a computer, so there is a second issue - how is each of the two dimensional matrices laid out in the computer's memory, which is an inherently one dimensional object? Now Fortran stores its objects in Column Major order. This means that in memory if you have element A( i, j ) the next memory location will hold A( i + 1, j ) if we are not an the end of a column, or A( 1, j + 1 ) if we are. So the matrix is laid out with the 'first index moving fastest'. To define this layout of a 2 dimensional object all we need to tell the program is how many elements of A you need to jump over to go from A( i, j ) to A( i, j + 1 ) - and it is this number which is the 'leading dimension', LDA, LDB, LDC in the documentation. And thus it is simply the number of rows in the matrix as declared or allocated.

    Now let's look at the documentation you quote

      LDB is INTEGER
       On entry, LDB specifies the first dimension of B as declared
       in the calling (sub) program. When  TRANSB = 'N' or 'n' then
       LDB must be at least  max( 1, k ), otherwise  LDB must be at
       least  max( 1, n ).
    

    LDB is the number of rows in the matrix B is it is declared. Thus

    1. If B is NOT transposed each column of B will be involved in a dot product with a row or column of A, depending on the transpose option for A. The mathematical dimensions say the dot product is k long. Thus the number of rows in B must be at least k to allow the maths to be right, and so LDB must be at least k
    2. If B IS transposed the mathematical first dimension of B will also be the mathematical second dimension of the result matrix, and this is given by N. So LDB in this case must be at least N

    So that answers most of it, and if you are always multiplying all of one matrix as declared by all of a second matrix the leading dimension will simply be the dirst dimension of each matrix as declared. But the 'at least' means you can multiply bits of arrays together. Consider the following, and by carefully separating those integers which define the maths, and those which define the memory layout, can you work out why it does what it does? Note that a( 2, 2 ) in the argument list means we "start" the matrix at this element - now think carefully what the leading dimension tells you and you should be able to sort out how this works.

    ian@eris:~/work/stack$ cat matmul2.f90
    Program sirt
    
      Integer, Parameter :: wp = Kind( 1.0d0 )
    
      Real( wp ), Dimension( 1:5, 1:5 ) :: A
      Real( wp ), Dimension( 1:3, 1:3 ) :: B
      Real( wp ), Dimension( 1:4, 1:4 ) :: C
    
      Integer :: i
    
      A = 0.0_wp
      Do i = 1, 5
         A( i, i ) = Real( i, wp )
      End Do
      Write( *, * ) 'A = '
      Do i = 1, Size( A, Dim = 1 )
         Write( *, '( 100( f4.1, 1x ) )' ) A( i, : )
      End Do
    
      B = 0.0_wp
      B( 1, 1 ) = 1.0_wp
      B( 3, 2 ) = 1.0_wp
      B( 2, 3 ) = 1.0_wp
      Write( *, * ) 'B = '
      Do i = 1, Size( B, Dim = 1 )
         Write( *, '( 100( f4.1, 1x ) )' ) B( i, : )
      End Do
    
      ! Lazy - should really just initialise only the bits of C that are NOT touched
      ! by the dgemm
      C = 0.0_wp
    
      Call dgemm('N', 'N', 3, 3, 3, 1.0_wp, A( 2, 2 ), Size( A, Dim = 1 ), &
                                            B        , Size( B, Dim = 1 ), &
                                    0.0_wp, C        , Size( C, Dim = 1 ) )
    
      Write( *, * ) 'C after dgemm'
      Write( *, * ) 'B = '
      Do i = 1, Size( C, Dim = 1 )
         Write( *, '( 100( f4.1, 1x ) )' ) C( i, : )
      End Do
    
    
    End Program sirt
    ian@eris:~/work/stack$ gfortran-8  -std=f2008 -fcheck=all -Wall -Wextra -pedantic -O matmul2.f90 -lblas
    /usr/bin/ld: warning: libgfortran.so.4, needed by //usr/lib/x86_64-linux-gnu/libopenblas.so.0, may conflict with libgfortran.so.5
    ian@eris:~/work/stack$ ./a.out
     A = 
     1.0  0.0  0.0  0.0  0.0
     0.0  2.0  0.0  0.0  0.0
     0.0  0.0  3.0  0.0  0.0
     0.0  0.0  0.0  4.0  0.0
     0.0  0.0  0.0  0.0  5.0
     B = 
     1.0  0.0  0.0
     0.0  0.0  1.0
     0.0  1.0  0.0
     C after dgemm
     B = 
     2.0  0.0  0.0  0.0
     0.0  0.0  3.0  0.0
     0.0  4.0  0.0  0.0
     0.0  0.0  0.0  0.0
    

    This use for multiplying a part of two larger matrices together is surprisingly common, especially in blocked linear algebra algorithms, and the separation of the mathematical and layout dimensions allows you to do it