Search code examples
fortranopenaccpgi-accelerator

Reference Argument Passing with Nested OpenACC Routines


I'm attempting to parallelize some Fortran 90 code using OpenACC, where a parallelized loop calls a sequential routine. When I attempt to run the code using the PGI Fortran compiler (2020.4), I obtain an error message saying that reference argument passing prevents parallelization.

My understanding is that this is likely because one routine exists on the Host while the other is on the Device, but I'm unclear on where I might be missing a pragma that would lead to this outcome.

The basic structure of the calling routine is:

subroutine OuterRoutine(F,G,X,Y) 

      real(wp), dimension(:,:), intent(IN) :: X      
      real(wp), dimension(:,:), intent(IN) :: Y
      real(wp), dimension(1,PT), intent(OUT) :: F
      real(wp), dimension(N_p,PT), intent(OUT) :: G
        
      ! Local Variables
      integer :: t, i, j

      !$acc data copyin(X,Y), copyout(F,G)

      !$acc parallel loop
      do t = 1,PT,1
            
          !$acc loop collapse(2) reduction(+:intr)
          do i = 1,N_int-1,1
            do j = 1,N_int-1,1
              G(i,j) = intgrdJ2(X(i,j),X(j,i),Y(i,j),Y(j,i),t)
            end do
          end do
          !$acc end loop

      !$acc end parallel loop

      !$acc end data

end subroutine OuterRoutine 

And the function being called is:

function intgrdJ2(z,mu,p,q,t)
    !$acc routine seq
    
    real(wp), intent(IN) :: z, mu, p, q
    integer, intent(IN) :: t
    real(wp) :: intgrdJ2
        
    ! Local Variables
    real(wp) :: mu2
    real(wp), dimension(N_p) :: nu_m2, psi_m2
    integer :: i
        
    mu2 = (mu*fh_pdf(z,mu,p))/f_pdf(z,mu,p)
        
    do i = 1,N_p,1
        nu_m2(i) = interpValue(mu2,mugrid,nu_knots(:,i,t))
        psi_m2(i) = interpValue(mu2,mugrid,psi_knots(:,i,t))
    end do
    
    intgrdJ2 = nu_m2(i)*psi_m2(i)
    
end function intgrdJ2

The routines interpValue, fh_pdf, and f_pdf are all contained in a used module, and denoted as !$acc routine seq. The variables mugrid, nu_knots, and psi_knots are all module-level variables, which are copied-in to the Device prior to calling OuterRoutine.

When I run the code, I get this sort of output from the compiler:

intgrdj2:
    576, Generating acc routine seq
         Generating Tesla code
    593, Reference argument passing prevents parallelization: mu2

Where 593 refers to the "nu_m2(i) = ..." line.

My understanding is that since the variable mu2 is a scalar declared inside of the sequential routine, each thread should have it's own copy of the variable, and I don't need to explicitly declare it to be private when I declare the data region. From reading this post it seems that the problem may be related to where the routines are located (Host vs Device). However, it seems as though all of the relevant pieces should be on the device because I'm specifying that routines are sequential.

As a first-time OpenACC user, any explanations about what I might be overlooking would be greatly appreciated!


Solution

  • My understanding is that since the variable mu2 is a scalar declared inside of the sequential routine, each thread should have it's own copy of the variable, and I don't need to explicitly declare it to be private when I declare the data region

    This is true in most cases. But what's likely happening here is that since Fortran by default passes variables by reference, the compiler must assume that it's reference can be taken by a module variable. Unlikely, but possible.

    The typical way to fix this is to pass the scalar by value, i.e. add the "value" attribute to the argument declaration in "interpValue". Alternately, you can explicitly privatize "mu2" by adding "!$acc loop seq private(mu2)" on the "i" loop.

    Now the message may just be indicating that the compiler can't auto-parallelize this loop. But since it's in a sequential routine, that wouldn't matter and you can safely ignore the message. Though, I don't have the full context so can't be 100% certain of this.