Search code examples
optimizationfortranmatrix-multiplicationdo-loops

How to simplify complex nested do-loops?


I have very tedious task to optimize some ancient Fortran77 code. Honestly, I don't know fortran at all. I know how loops works and how to multiply matrices. I also know that this loop can be optimized to few 3-4 nested loops:

    do i = 1, nocca
     do j = 1, nocca
       do k = 1, noccb                                                                                                               
         do l = 1, noccc
           do m = 1, nva
             do n =1, nvb
               saps = oab(j, n+noccb)
               sbap = oab(j, k)
               sac = oac(i, l)
               scr = oac(m + nocca, l)
               im = i + nocca*(m-1)  
               kn = k + noccb*(n-1)
               imkn = im + oava*(kn-1)
               vrsab = ovovab(imkn)
   demp3 = demp3 + 2.0d0*vrsab*(2.0d0*saps*sbap*sac*scr)
             end do
           end do
         end do
       end do
     end do
   end do

I was trying to calculate sapssac in the separate loop and similarly sacscr:

c      Calculate saps * sbap 
       do j = 1, nocca
         do k  = 1, noccb
           do n = 1, nvb
             saps = oab(j, n + noccb)
             sbap = oab(j, k)   
             saps_sbap(j, k) = saps_sbap(j, k) + saps*sbap
           end do
         end do
       end do

c      Calculate sac_scr     
       do i = 1, nocca
         do l = 1, noccc
           do m = 1, nva
             sac = oac(i, l)
             scr = oac(m + nocca, l)
             sac_scr(i, l) = sac_scr(i, l) + sac*scr
           end do
         end do
       end do

Finally I would like to write the last part to calculate demp3 but there are 5 indices not 4 as I expected. Maybe I'm doing this entirely wrong?

Any suggestions? hints?

Thanks in advance!


Solution

  • It is clearly easier to downwote than try to understand a problem. I found optimal solution by myself. Here it is:

    1. Split large sum into two components. Let's consider only the first one:

      demp3 = demp3 + 2.0d0*vrsab*(2.0d0*saps*sbap*sac*scr)

    2. Multiply sapssbap and sacscr in two separate loops. (Dimensions can be found as maximal indices in the original loop):

      REAL*16, DIMENSION(nvb, noccb) :: saps_sbap
      REAL*16, DIMENSION(nocca, nva) :: sac_scr
      

    following loops multiply sapssbap and sacscr:

    c      Calculate saps * sbap 
    
       do k = 1, noccb
         do n = 1, nvb
           saps_sbap(n, k) = 0.0d0
           do j = 1, nocca
             saps = oab(j, n + noccb)
             sbap = oab(j, k)   
             saps_sbap(n, k) = saps_sbap(n, k) + saps*sbap
           end do
         end do
       end do 
    
    c      Calculate sac * scr
       do i = 1, nocca
         do m = 1, nva
           sac_scr(i, m) = 0.0d0
           do l = 1, noccc
             sac = oac(i, l)
             scr = oac(m + nocca, l)
             sac_scr(i, m) = sac_scr(i, m) + sac*scr 
           end do
         end do
       end do
    
    1. Finally put all things together in such a loop:

      do i = 1, nocca do k = 1, noccb do m = 1, nva do n = 1, nvb im = i + nocca*(m-1) kn = k + noccb*(n-1) imkn = im +oava*(kn-1) vrsab = ovovab(imkn) demp3 = demp3 + 2.0d0 * vrsab * 2.0d0 *saps_sbap(n,k) * sac_scr(i,m) end do end do end do end do

    In such a way instead of O(N^6) loop there are two O(N^3) and one O(N^4)

    Sorry for the last piece format, line-end didn't worked.