Efficient sum, vector-matrix-vector by Fortran

I want to compute the summation more efficient. There are nested loops, which I want to avoid.

! If i+j <= I,then A_{j} = \sum_{m,n,i} C_{m, i+j} G_{ m, n, i, j} C_{n, i}

! else if i+j >= I, then A_{j} = \sum_{m,n,i} C_{m, i+j-I} G_{ m, n, i, j} C_{n, i}

program main
  implicit none
  real, allocatable :: A(:)
  real, allocatable :: C(:,:), G(:,:,:,:)
  integer :: i, j, m, n
  integer, parameter :: N = 1500, I = 2000

  allocate(A(J))
  allocate(C(N,I))
  allocate(G(N,N,I,I))
  ! If i+j <= I,then
  ! A_{j} = \sum_{m,n,i} C_{m, i+j} G_{ m, n, i, j} C_{n, i} 
  ! else if i+j >= I, then
  ! A_{j} = \sum_{m,n,i} C_{m, i+j-I} G_{ m, n, i, j} C_{n, i}
  do j = 1, I
    do i = 1, I
      if ( i + j <= I ) then
        do n = 1, N
          do m = 1, N
            A(j) = A(j) + C(m,i+j) * G(m,n,i,j) * C(n,i)
          end do
        end do
      else
        do n = 1, N
          do m = 1, N
            A(j) = A(j) + C(m,i+j-I) * G(m,n,i,j) * C(n,i)
          end do
        end do
      end if
    end do
  end do
  
end program main

Solution

Let's start by addressing @francescalus' comment, and rename I and N. Let's call them II and NN. No, this is not a good naming convention, but I don't know what these variables actually are.

Let's also assume you've initialised your variables, as per @lastchance's comment.

The first thing that leaps out is the lines

do j=1,II
  do i=1,II
    if (i+j <= II) then
      ...
    else
      ...

this loop-with-an-if-inside can be replaced with a couple of loops with no if, which is usually a good idea. Something like:

do j=1,II
  do i=1,II-j
    ...
  end do
  do i=II-j+1,II
    ...
  end do

Now let's use matmul to clean up the loops over m and n, something like:

do j=1,II
  do i=1,II-j
    A(j) = A(j) + dot_product(matmul(C(:,i+j), G(:,:,i,j)), C(:,i))
  end do
  do i=II-j+1,II
    A(j) = A(j) + dot_product(matmul(C(:,i+j-I), G(:,:,i,j)), C(:,i))
  end do
end do

Note that this isn't doing any less work than your code (and indeed, since you're looping over all four indices of G, doing less work doesn't look possible), but it should at least do the work a little more efficiently.