I have very tedious task to optimize some ancient Fortran77 code. Honestly, I don't know fortran at all. I know how loops works and how to multiply matrices. I also know that this loop can be optimized to few 3-4 nested loops:
do i = 1, nocca
do j = 1, nocca
do k = 1, noccb
do l = 1, noccc
do m = 1, nva
do n =1, nvb
saps = oab(j, n+noccb)
sbap = oab(j, k)
sac = oac(i, l)
scr = oac(m + nocca, l)
im = i + nocca*(m-1)
kn = k + noccb*(n-1)
imkn = im + oava*(kn-1)
vrsab = ovovab(imkn)
demp3 = demp3 + 2.0d0*vrsab*(2.0d0*saps*sbap*sac*scr)
end do
end do
end do
end do
end do
end do
I was trying to calculate sapssac in the separate loop and similarly sacscr:
c Calculate saps * sbap
do j = 1, nocca
do k = 1, noccb
do n = 1, nvb
saps = oab(j, n + noccb)
sbap = oab(j, k)
saps_sbap(j, k) = saps_sbap(j, k) + saps*sbap
end do
end do
end do
c Calculate sac_scr
do i = 1, nocca
do l = 1, noccc
do m = 1, nva
sac = oac(i, l)
scr = oac(m + nocca, l)
sac_scr(i, l) = sac_scr(i, l) + sac*scr
end do
end do
end do
Finally I would like to write the last part to calculate demp3 but there are 5 indices not 4 as I expected. Maybe I'm doing this entirely wrong?
Any suggestions? hints?
Thanks in advance!
It is clearly easier to downwote than try to understand a problem. I found optimal solution by myself. Here it is:
Split large sum into two components. Let's consider only the first one:
demp3 = demp3 + 2.0d0*vrsab*(2.0d0*saps*sbap*sac*scr)
Multiply sapssbap and sacscr in two separate loops. (Dimensions can be found as maximal indices in the original loop):
REAL*16, DIMENSION(nvb, noccb) :: saps_sbap
REAL*16, DIMENSION(nocca, nva) :: sac_scr
following loops multiply sapssbap and sacscr:
c Calculate saps * sbap
do k = 1, noccb
do n = 1, nvb
saps_sbap(n, k) = 0.0d0
do j = 1, nocca
saps = oab(j, n + noccb)
sbap = oab(j, k)
saps_sbap(n, k) = saps_sbap(n, k) + saps*sbap
end do
end do
end do
c Calculate sac * scr
do i = 1, nocca
do m = 1, nva
sac_scr(i, m) = 0.0d0
do l = 1, noccc
sac = oac(i, l)
scr = oac(m + nocca, l)
sac_scr(i, m) = sac_scr(i, m) + sac*scr
end do
end do
end do
Finally put all things together in such a loop:
do i = 1, nocca
do k = 1, noccb
do m = 1, nva
do n = 1, nvb
im = i + nocca*(m-1)
kn = k + noccb*(n-1)
imkn = im +oava*(kn-1)
vrsab = ovovab(imkn)
demp3 = demp3 + 2.0d0 * vrsab * 2.0d0 *saps_sbap(n,k) * sac_scr(i,m)
end do
end do
end do
end do
In such a way instead of O(N^6) loop there are two O(N^3) and one O(N^4)
Sorry for the last piece format, line-end didn't worked.