I have a CUDA code in which I would like to include external code that consists of Fortran with OpenACC kernels. I have two files with the following content inspired on a discussion on the NVIDIA website. File main.cu
is the following:
#include <cstdio>
extern "C" void saxpy(int*, float*, float*, float*);
int main(int argc, char **argv)
{
float* x;
float* y;
float* dx;
float* dy;
int n = 1<<20;
x = (float*) malloc(n*sizeof(float));
y = (float*) malloc(n*sizeof(float));
for (int i=0; i<n; ++i)
{
x[i] = 1.f;
y[i] = 0.f;
}
cudaMalloc((void**) &dx, (size_t) n*sizeof(float));
cudaMalloc((void**) &dy, (size_t) n*sizeof(float));
cudaMemcpy(dx, x, (size_t) n*sizeof(float), cudaMemcpyHostToDevice);
cudaMemcpy(dy, y, (size_t) n*sizeof(float), cudaMemcpyHostToDevice);
float factor = 2.f;
saxpy(&n, &factor, dx, dy);
cudaMemcpy(y, dy, (size_t) n*sizeof(float), cudaMemcpyDeviceToHost);
printf("%f, %f\n", y[0], y[n-1]);
return 0;
}
The second file saxpy.f90
is:
subroutine saxpy(n, a, x, y) bind(c, name="saxpy")
use iso_c_binding, only: c_int, c_float
integer(kind=c_int), intent(in) :: n
real(kind=c_float), intent(in) :: a
real(kind=c_float), dimension(n), intent(in) :: x(n)
real(kind=c_float), dimension(n), intent(inout) :: y(n)
!$acc parallel deviceptr(x, y)
do i = 1, n
y(i) = y(i) + a*x(i)
end do
!$acc end parallel
end subroutine
How do I compile this with nvcc
and the PGI-compiler combined? I have tried many different options, but I have always ended with unresolved externals.
What I tried is: pgf90 -ta=tesla:cc35 -acc saxpy.f90 -c
for the Fortran file and that compiles fine. The next step is where I am stuck. This: nvcc -arch=sm_35 -ccbin pgc++ main.cu saxpy.o
yields unresolved externals for which I am unsure how to solve it. How can I find out which external libraries to include?
The symbols are most likely missing since you're not adding either the OpenACC or Fortran runtime libraries to your link. Also, when not using a PGI driver to link, you need to add the "nordc" flag. For example:
% pgfortran -c -ta=tesla:cc70,nordc saxpy.f90
% nvcc -arch=sm_70 -ccbin pgc++ -Xcompiler "-ta=tesla:cc70 -pgf90libs" main.cu saxpy.o
% a.out
2.000000, 2.000000
Though, I'd recommend using pgfortran to link so you can use RDC and don't need to add the Fortran runtime libraries:
% nvcc -arch=sm_70 -ccbin pgc++ -c main.cu
% pgfortran -Mcuda -ta=tesla:cc70 -Mnomain saxpy.f90 main.o
saxpy.f90:
% a.out
2.000000, 2.000000