Search code examples
c++fortranopenacc

OpenACC: Why updating an array depends on the location of the update directive


I'm new to openacc. I'm trying to use it to accelerate a particle code. However, I don't understand why when updating an array (eta in the program below) on the host, it gives different results depending on the location of '!$acc update self'. Here is a code that re-produce this problem:

program approximateFun
use funs
use paras_mod

    integer :: Nx
    real(dp) :: dx
    real(dp), dimension(:), allocatable :: x, eta
    real(dp), dimension(5) :: xp, fAtxp

    !$acc declare create(Nx)
    !$acc declare create(x)
    !$acc declare create(eta)
    !$acc declare create(dx)
    !$acc declare create(fAtxp)
    !$acc declare create(xp)

    Nx = 16
    !$acc update device(Nx)

    xp = (/3.9, 4.1, 4.5, 5.0, 5.6/)
    !$acc update device(xp)

    allocate(x(1 : Nx))
    allocate(eta(1 : Nx))
    eta = 0.0d0

    dx = 2 * pi / (Nx - 1)
    !$acc update device(dx)

    do i = 1, Nx
        x(i) = (i - 1.0d0) * dx
    end do
    !$acc update device(x)


    call calc_etaVec(x, Nx, eta)
    !$acc update self(eta)        ! gives the correct results

    !$acc parallel loop present(dx, xp, eta, fAtxp)
    do i = 1, 5
        call calcFunAtx(xp(i), dx, eta, fAtxp(i))
    end do

    !$acc update self (fAtxp)
    !!$acc update self(eta) !---> gives wrong result

    write(6, *) 'eta', eta
    do i = 1, 5
        write(6, *) 'xp, fAtxp', xp(i), fAtxp(i)
    end do

    deallocate(x)
    deallocate(eta)

end program approximateFun

The previous program uses the following modules

MODULE funs
    use paras_mod
    implicit none
    CONTAINS

    subroutine calc_etaVec(x, nx, eta)
        integer, intent(in) :: nx
        real(dp), dimension(:), intent(in) :: x
        real(dp), dimension(:), intent(out) :: eta

        integer :: i
        !$acc parallel loop present(x, eta)
        do i = 1, nx
            eta(i) = sin(x(i))
        end do
    end subroutine

    subroutine calcFunAtx(xp, dx, eta, fAtx)
        real(dp), intent(in) :: xp, dx
        real(dp), dimension(:), intent(in) :: eta
        real(dp), intent(out) :: fAtx
        integer :: idx

        !$acc routine seq

        idx = 1 + floor(xp / dx)
        fAtx = eta(idx)
    end subroutine calcFunAtx

END MODULE

and

module paras_mod
    implicit none
    save
    INTEGER, PARAMETER :: dp = selected_real_kind(14,300)
    REAL(dp), PARAMETER :: PI=4.0d0*atan(1.0d0)
end module paras_mod

When using !$acc update self(eta) directly after call calc_etaVec(x, Nx, eta), eta is updated correctly. But when used after the loop, only the first five elements are correct, while the remaining are zeros. What are the reasons behind that?

thanks

The output when !$acc update self(eta) is used directly after call calc_etaVec(x, Nx, eta) is

   0.000000000000000        0.4067366430758002
   0.7431448254773941        0.9510565162951535        0.9945218953682734 
   0.8660254037844388        0.5877852522924732        0.2079116908177597 
  -0.2079116908177591       -0.5877852522924730       -0.8660254037844384 
  -0.9945218953682732       -0.9510565162951536       -0.7431448254773946 
  -0.4067366430758009       -2.4492935982947064E-016

which is correct. However, when used after the loop, the output is

    0.000000000000000        0.4067366430758002
    0.7431448254773941        0.9510565162951535        0.9945218953682734
    0.000000000000000         0.000000000000000         0.000000000000000
    0.000000000000000         0.000000000000000         0.000000000000000
    0.000000000000000         0.000000000000000         0.000000000000000
    0.000000000000000         0.000000000000000

Solution

  • This one was perplexing till I determined that its a compiler error. I've filed a problem report, TPR #32673, and sent it to engineering for review.

    When setting the environment variable NV_ACC_NOTIFY=2, which shows the data movement, I see that the compiler is only copying 40 bytes, versus the correct 128. However, if I remove "eta" from the preceding present clause, then it's correct.

    #ifdef WORKS
        !$acc parallel loop present(dx, fAtxp)
    #else
        !$acc parallel loop present(dx, fAtxp, eta)
    #endif
        do i = 1, 5
            call calcFunAtx(xp(i), dx, eta, fAtxp(i))
        end do
    

    Also, this only occurs when using an allocatable in a declare create directive. If you switched to using data regions, the issue doesn't occur. Probably why we didn't see it before (looks like the bugs been there since mid-2020). Using the declare directive in anything but a module is rare.