Is it possible to call cublas functions from a device function?

In here Robert Crovella said that cublas routines can be called from device code. Although I am using dynamic parallelism and compiling with compute capability 3.5, I cannot manage to call Cublas routines from a device function. I always get the error "calling a host function from a device/global function is not allowed" My code contains device functions which call CUBLAS routines like cublsAlloc, cublasGetVector, cublasSetVector and cublasDgemm

My compilation and linking commands:

  
nvcc -arch=sm_35 -I. -I/usr/local/cuda/include -c -O3 -dc GPUutil.cu -o ./build/GPUutil.o   
nvcc -arch=sm_35 -I. -I/usr/local/cuda/include -c -O3 -dc DivideParalelo.cu -o ./build/DivideParalelo.o
nvcc -arch=sm_35 -I. -I/usr/local/cuda/include -dlink ./build/io.o ./build/GPUutil.o ./build/DivideParalelo.o -lcudadevrt -o ./build/link.o
icc -Wwrite-strings ./build/GPUutil.o ./build/DivideParalelo.o ./build/link.o -lcudadevrt -L/usr/local/cuda/lib64  -L~/Intel/composer_xe_2015.0.090/mkl/lib/intel64  -L~/Intel/composer_xe_2015.0.090/mkl/../compiler/lib/intel64  -Wl,--start-group ~/Intel/composer_xe_2015.0.090/mkl/lib/intel64/libmkl_intel_lp64.a ~/Intel/composer_xe_2015.0.090/mkl/lib/intel64/libmkl_sequential.a ~/Intel/composer_xe_2015.0.090/mkl/lib/intel64/libmkl_core.a ~/Intel/composer_xe_2015.0.090/mkl/../compiler/lib/intel64/libiomp5.a -Wl,--end-group -lpthread  -lm  -lcublas -lcudart   -o DivideParalelo

Solution

Here you can find all the details about cuBLAS device API, such as:

Starting with release 5.0, the CUDA Toolkit now provides a static cuBLAS Library cublas_device.a that contains device routines with the same API as the regular cuBLAS Library. Those routines use internally the Dynamic Parallelism feature to launch kernel from within and thus is only available for device with compute capability at least equal to 3.5.

In order to use those library routines from the device the user must include the header file “cublas_v2.h” corresponding to the new cuBLAS API and link against the static cuBLAS library cublas_device.a.

If you still experience issues even after reading through the documentation and applying all of the steps described there, then ask for additional assistance.