Particular Allocating device memory for _global_ function in cuda

want to do this programm on cuda.

1.in "main.cpp"

struct Center{
double * Data;
int dimension;
};
typedef struct Center Center;

//I allow a pointer on N Center elements by the CUDAMALLOC like follow

....
#include "kernel.cu"
....
center *V_dev;
int M =100, n=4; 

cudaStatus = cudaMalloc((void**)&V_dev,M*sizeof(Center));
Init<<<1,M>>>(V_dev, M, N); //I always know the dimension of N before calling

My "kernel.cu" file is something like this

#include "cuda_runtime.h"
#include"device_launch_parameters.h"
... //other include headers to allow my .cu file to know the Center type definition

__global__ void Init(Center *V, int N, int dimension){
V[threadIdx.x].dimension = dimension;
V[threadIdx.x].Data = (double*)malloc(dimension*sizeof(double));
for(int i=0; i<dimension; i++)
    V[threadIdx.x].Data[i] = 0; //For the value, it can be any kind of operation returning a float that i want to be able put here

}

I'm on visual studio 2008 and CUDA 5.0. When I Build my project, I've got these errors:

error: calling a _host_ function("malloc") from a _global_ function("Init") is not allowed.

I want to know please how can I perform this? (I know that 'malloc' and other cpu memory allocation are not allowed for device memory.

Solution

malloc is allowed in device code but you have to be compiling for a cc2.0 or greater target GPU.

Adjust your VS project settings to remove any GPU device settings like compute_10,sm_10 and replace it with compute_20,sm_20 or higher to match your GPU. (And, to run that code, your GPU needs to be cc2.0 or higher.)