This is my first question ;-)
I try to use AVX in CUDA application (ccminer) but nvcc shows an error:
/usr/local/cuda/bin/nvcc -Xcompiler "-Wall -mavx" -O3 -I . -Xptxas "-abi=no -v" -gencode=arch=compute_50,code=\"sm_50,compute_50\" --maxrregcount=80 --ptxas-options=-v -I./compat/jansson -o x11/x11.o -c x11/x11.cu
/usr/lib/gcc/x86_64-linux-gnu/4.8/include/avxintrin.h(118): error: identifier "__builtin_ia32_addpd256" is undefined
[...]
This is just the first error. There are many 'undefined' builtin functions :-(
Everything is ok for 'C/C++' programs - with .c or .cpp extensions. But .cu - error :-( What do I do wrong ? I can compile ccminer but I cannot add AVX intrinsics to .cu files - only .c files. I use Intel intrinsics not gcc.
Any help greatly appreciated. Thanks in advance.
Linux Mint (ubuntu 13) 64bit, gcc 4.8.1, cuda 6.5.
I do not expect AVX to work on GPU. In .cu file there is small portion CPU based code which I want to vectorize.
Here is example to reproduce the error. I took the simplest example from: http://computer-graphics.se/hello-world-for-cuda.html
Added line at the beginning:
#include <immintrin.h>
and tried to compile with the command:
nvcc cudahello.cu -Xcompiler -mavx
got an error:
/usr/lib/gcc/x86_64-linux-gnu/4.8/include/avxintrin.h(118): error: identifier "__builtin_ia32_addpd256" is undefined
The same code without #include <immintrin.h>
compiles without problems.
Here is whole code:
#include <stdio.h>
#if defined(__AVX__)
#include <immintrin.h>
#endif
const int N = 16;
const int blocksize = 16;
__global__
void hello(char *a, int *b)
{
a[threadIdx.x] += b[threadIdx.x];
}
int main()
{
char a[N] = "Hello \0\0\0\0\0\0";
int b[N] = {15, 10, 6, 0, -11, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
char *ad;
int *bd;
const int csize = N*sizeof(char);
const int isize = N*sizeof(int);
printf("%s", a);
cudaMalloc( (void**)&ad, csize );
cudaMalloc( (void**)&bd, isize );
cudaMemcpy( ad, a, csize, cudaMemcpyHostToDevice );
cudaMemcpy( bd, b, isize, cudaMemcpyHostToDevice );
dim3 dimBlock( blocksize, 1 );
dim3 dimGrid( 1, 1 );
hello<<<dimGrid, dimBlock>>>(ad, bd);
cudaMemcpy( a, ad, csize, cudaMemcpyDeviceToHost );
cudaFree( ad );
cudaFree( bd );
printf("%s\n", a);
return EXIT_SUCCESS;
}
Compile with
nvcc cudahello.cu -Xcompiler -mavx
to get the error or with
nvcc cudahello.cu
to compile clean.
I think I have an answer. Functions like:
_builtin_ia32_addpd256
are built into gcc and nvcc does not know about them. Since they are declared in immintrin.h
nvcc returns errors while compiling .cu file with immintrin.h included. So we cannot mix cuda features with builtin gcc functions in one file.