matlab memory-management cuda streaming mex

How to efficiently pass a variable from Matlab to GPU in an asynchronous way?

In my CUDA project, I can define a pinned memory, copy the data from a .txt file to the pinned memory and use streaming to copy the data to GPU while doing my processing in a Kernel. Now, I want to make a CUDA MEX file and then pass my data (called RfData) variable to it in Matlab. However, I have noticed that there is no way of directly allocating arrays coming from MATLAB as a pinned CUDA memory. I have to do it with a pageable memory then:

    int* RfData;
    RfData = (int *)mxGetPr(prhs[0]); 
    int* Device_RfData;  
    int ArrayByteSize_RfData = sizeof(int) * (96* 96* 2048);
    cudaMalloc((int**)&Device_RfData, ArrayByteSize_RfData);
    cudaMemcpy(Device_RfData, RfData, ArrayByteSize_RfData, cudaMemcpyHostToDevice);

I need to copy the RfData async using streaming. The only way I know is by copying the RfData to a pinned memory and then using streaming:


     int* RfData_Pinned;
    cudaHostAlloc((void**)&RfData_Pinned, ArrayByteSize_RfData,  cudaHostAllocWriteCombined);
    for (int j = 0; j < (96* 96* 2048); j++)
    {
        RfData_Pinned[j] = RfData[j];
    }

However, it increases the overall processing time of my MEX function.

How can I transfer my data from Matlab to GPU in an async way? Maybe there is a command in CUDA that allows a fast copy of data from pageable to pinned memory?

Thanks in advance, Moein.

Solution

You can indeed allocate pinned memory with cudaHostAlloc but if you have already allocated memory, you can instead just pin it with cudaHostRegister, that takes an already allocated host array pointer (taken from mxGetPr in your case).

Note that this will take time to pin the memory, but possibly less tha doing cudaHostAlloc and then copying it.