Search code examples
cudanvccopenaccpgi

Calling Fortran OpenACC from CUDA file. How to compile with PGI?


I have a CUDA code in which I would like to include external code that consists of Fortran with OpenACC kernels. I have two files with the following content inspired on a discussion on the NVIDIA website. File main.cu is the following:

#include <cstdio>

extern "C" void saxpy(int*, float*, float*, float*);

int main(int argc, char **argv)
{
    float* x;
    float* y;
    float* dx;
    float* dy;

    int n = 1<<20;

    x = (float*) malloc(n*sizeof(float));
    y = (float*) malloc(n*sizeof(float));

    for (int i=0; i<n; ++i)
    {
        x[i] = 1.f;
        y[i] = 0.f;
    }

    cudaMalloc((void**) &dx, (size_t) n*sizeof(float));
    cudaMalloc((void**) &dy, (size_t) n*sizeof(float));

    cudaMemcpy(dx, x, (size_t) n*sizeof(float), cudaMemcpyHostToDevice);
    cudaMemcpy(dy, y, (size_t) n*sizeof(float), cudaMemcpyHostToDevice);

    float factor = 2.f;
    saxpy(&n, &factor, dx, dy);

    cudaMemcpy(y, dy, (size_t) n*sizeof(float), cudaMemcpyDeviceToHost);
    printf("%f, %f\n", y[0], y[n-1]);

    return 0;
}

The second file saxpy.f90 is:

subroutine saxpy(n, a, x, y) bind(c, name="saxpy")
    use iso_c_binding, only: c_int, c_float

    integer(kind=c_int), intent(in) :: n
    real(kind=c_float), intent(in) :: a
    real(kind=c_float), dimension(n), intent(in) :: x(n)
    real(kind=c_float), dimension(n), intent(inout) :: y(n)

    !$acc parallel deviceptr(x, y)
    do i = 1, n
        y(i) = y(i) + a*x(i)
    end do
    !$acc end parallel
end subroutine

How do I compile this with nvcc and the PGI-compiler combined? I have tried many different options, but I have always ended with unresolved externals.

What I tried is: pgf90 -ta=tesla:cc35 -acc saxpy.f90 -c for the Fortran file and that compiles fine. The next step is where I am stuck. This: nvcc -arch=sm_35 -ccbin pgc++ main.cu saxpy.o yields unresolved externals for which I am unsure how to solve it. How can I find out which external libraries to include?


Solution

  • The symbols are most likely missing since you're not adding either the OpenACC or Fortran runtime libraries to your link. Also, when not using a PGI driver to link, you need to add the "nordc" flag. For example:

    % pgfortran -c -ta=tesla:cc70,nordc saxpy.f90                                       
    % nvcc -arch=sm_70 -ccbin pgc++ -Xcompiler "-ta=tesla:cc70 -pgf90libs" main.cu saxpy.o
    % a.out
    2.000000, 2.000000
    

    Though, I'd recommend using pgfortran to link so you can use RDC and don't need to add the Fortran runtime libraries:

    % nvcc -arch=sm_70 -ccbin pgc++ -c main.cu
    % pgfortran -Mcuda -ta=tesla:cc70 -Mnomain saxpy.f90 main.o
    saxpy.f90:
    % a.out
    2.000000, 2.000000