nvcc with avx support cannot find gcc builtin intrinsics

This is my first question ;-)

I try to use AVX in CUDA application (ccminer) but nvcc shows an error:

/usr/local/cuda/bin/nvcc -Xcompiler "-Wall -mavx" -O3 -I . -Xptxas "-abi=no -v" -gencode=arch=compute_50,code=\"sm_50,compute_50\" --maxrregcount=80 --ptxas-options=-v -I./compat/jansson -o x11/x11.o -c x11/x11.cu
/usr/lib/gcc/x86_64-linux-gnu/4.8/include/avxintrin.h(118): error: identifier "__builtin_ia32_addpd256" is undefined

[...]

This is just the first error. There are many 'undefined' builtin functions :-(

Everything is ok for 'C/C++' programs - with .c or .cpp extensions. But .cu - error :-( What do I do wrong ? I can compile ccminer but I cannot add AVX intrinsics to .cu files - only .c files. I use Intel intrinsics not gcc.

Any help greatly appreciated. Thanks in advance.

Linux Mint (ubuntu 13) 64bit, gcc 4.8.1, cuda 6.5.

I do not expect AVX to work on GPU. In .cu file there is small portion CPU based code which I want to vectorize.

Here is example to reproduce the error. I took the simplest example from: http://computer-graphics.se/hello-world-for-cuda.html

Added line at the beginning:

#include <immintrin.h>

and tried to compile with the command:

nvcc cudahello.cu -Xcompiler -mavx

got an error:

/usr/lib/gcc/x86_64-linux-gnu/4.8/include/avxintrin.h(118): error: identifier "__builtin_ia32_addpd256" is undefined

The same code without #include <immintrin.h> compiles without problems.

Here is whole code:

#include <stdio.h>
#if defined(__AVX__)
#include <immintrin.h>
#endif

const int N = 16; 
const int blocksize = 16; 

__global__ 
void hello(char *a, int *b) 
{
    a[threadIdx.x] += b[threadIdx.x];
}

int main()
{
    char a[N] = "Hello \0\0\0\0\0\0";
    int b[N] = {15, 10, 6, 0, -11, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};

    char *ad;
    int *bd;
    const int csize = N*sizeof(char);
    const int isize = N*sizeof(int);

    printf("%s", a);

    cudaMalloc( (void**)&ad, csize ); 
    cudaMalloc( (void**)&bd, isize ); 
    cudaMemcpy( ad, a, csize, cudaMemcpyHostToDevice ); 
    cudaMemcpy( bd, b, isize, cudaMemcpyHostToDevice ); 

    dim3 dimBlock( blocksize, 1 );
    dim3 dimGrid( 1, 1 );
    hello<<<dimGrid, dimBlock>>>(ad, bd);
    cudaMemcpy( a, ad, csize, cudaMemcpyDeviceToHost ); 
    cudaFree( ad );
    cudaFree( bd );

    printf("%s\n", a);
    return EXIT_SUCCESS;
}

Compile with

nvcc cudahello.cu -Xcompiler -mavx

to get the error or with

nvcc cudahello.cu

to compile clean.

Solution

I think I have an answer. Functions like:

_builtin_ia32_addpd256

are built into gcc and nvcc does not know about them. Since they are declared in immintrin.h nvcc returns errors while compiling .cu file with immintrin.h included. So we cannot mix cuda features with builtin gcc functions in one file.