Search code examples
c++matrix-multiplicationlapack

How does the dimention argument of `sgemm` work?


I'm trying to understand the documentation for sgemm as I am transitioning code from using this library to a different library.

The function prototype is

sgemm   (   character   TRANSA,
        character   TRANSB,
        integer     M,
        integer     N,
        integer     K,
        real    ALPHA,
        real, dimension(lda,*)      A,
        integer     LDA,
        real, dimension(ldb,*)      B,
        integer     LDB,
        real    BETA,
        real, dimension(ldc,*)      C,
        integer     LDC 
    )   

I am having trouble understanding the role or LDA and LDB. The documentation says

LDA is INTEGER

On entry, LDA specifies the first dimension of A as declared in the calling (sub) program. When TRANSA = 'N' or 'n' then LDA must be at least max( 1, m ), otherwise LDA must be at least max( 1, k ).

What does it mean that it specifies the first dimension of A? Is this like switching between row and column major? Or is this slicing the tensor?


Solution

  • LD stands for leading dimension. BLAS is originally a library of Fortran 77 subroutines and in Fortran matrices are stored column-wise: A(i,j) is immediately followed in memory by A(i+1,j), which is opposite of C/C++ where a[i][j] is followed by a[i][j+1]. In order to access element A(i,j) of a matrix that has dimensions A(LDA,*) (which reads as LDA rows and an unspecified number of columns), you need to look (j-1)*LDA + (i-1) elements from the beginning of the matrix (Fortran arrays are 1-indexed by default), therefore you need to know the value of LDA. You don't need to know the actual number of columns, therefore the * in the dummy argument.

    It is the same in C/C++. If you have a 2D array declared as a[something][LDA], then element a[i][j] is located i*LDA + j positions after the start of the array, and you only need to know LDA - the value of something does not affect the calculation of the address of a[i][j].

    Although GEMM operates on an M x K matrix A, the actual data may be embedded in a bigger matrix that is LDA x L, where LDA >= M and L >= K, therefore the LDA is specified explicitly. The same applies to LDB and LDC.

    BLAS was developed many years ago when computer programming was quite different than what it is today. Memory management, in particular, was not as flexible as it is nowadays. Allocating one big matrix and then using and reusing portions of it to store smaller matrices was the norm. Also, GEMM is extensively used in, e.g., iterative algorithms that work on various sub-matrices and it is faster to keep the data in the original matrix and just specify the sub-matrix location and dimension, so you need to provide both dimensions.

    Starting with Fortran 90, the language has array slicing and automatic array descriptors that allow one to discover both the dimensions of a slice and those of the bigger matrix, so if GEMM was written in Fortran 90 or later, it wouldn't be that verbose in respect to its arguments. But even if that was the case, C doesn't have array descriptors, so you'll still have to provide all those arguments in order to make GEMM callable from C. In C++, one can hide the descriptor inside a matrix class, and many math libraries actually do so (for example, Scythe).