Search code examples
loopsmultidimensional-arrayfortran

Optimizing array multiplication in fortran


I am trying to optimize the array multiplication of the following type in fortran:

    do i = 1, n
    do j = 1, n
    do k = 1, n
            do i1 = 1, n
            do j1 = 1, n
            do k1 = 1, n
                    B(k1,j1,i1) = B(k1,j1,i1) + &
                    A(k,j,i)*v1(k,k1)*v2(j,j1)*v3(i,i1)
            enddo
            enddo
            enddo
    enddo
    enddo
    enddo

Here A and B are of size (n,n,n) while v1, v2, and v3 are (n,n). It seems that there ought to be vectorization of some sort that would allow me to speed up this evaluation. I tried different ordering of the indices. Is there a faster way of calculating B?

EDIT:

Alright, one can use matmul to obtain about 10x speed up:

    do i = 1, n
    do i1 = 1, n
    do j = 1, n
    do j1 = 1, n
            B(:,j1,i1) = B(:,j1,i1) + &   
            matmul(A(:,j,i),v1)*v2(j,j1)*v3(i,i1)                                             
    enddo
    enddo
    enddo
    enddo

Can we do even better?


Solution

  • You can at the very least reduce two of the loop pairs to matrix multiplication, using either matmul or BLAS/LAPACK, e.g.

    do i1=1,n
      do i=1,n
        B(:,:,i1) = B(:,:,i1) &
                & + matmul(matmul(transpose(v1), A(:,:,i)), v2) * v3(i,i1)
      enddo
    enddo
    

    You can almost certainly speed this up further by caching the results of the double matmul, e.g.

    real :: intermediate(n,n,n)
    
    do i=1,n
      intermediate(:,:,i) = matmul(matmul(transpose(v1), A(:,:,i)), v2)
    enddo
    
    do i1=1,n
      do i=1,n
        B(:,:,i1) = B(:,:,i1) + intermediate(:,:,i) * v3(i,i1)
      enddo
    enddo
    

    Which might in turn be sped up with more matmul, e.g.

    real :: intermediate(n,n,n)
    
    do i=1,n
      intermediate(:,:,i) = matmul(matmul(transpose(v1), A(:,:,i)), v2)
    enddo
    
    do j1=1,n
      B(:,j1,:) = matmul(intermediate(:,j1,:), v3))
    enddo