How nppiResizeSqrPixel_32f_C4R() works?

How the above function perform operation in cuda . Do we need to write CudaMalloc() or MemCopy() along with this or just a call with internally do all this.

I wrote

nppiResizeSqrPixel_32f_C4R(&in[0],sizeofImage,StepSize,&out[0],StepSizeOutput,DestRoi,Xfactor,YFactor,NULL,NULL,16);

Here 'in' is vector having input image and 'out' is an empty vector . But after executing the above function the output vector still '0' . Can you please guide me how the function resizes.

Solution

It operates on device data, so you will need to use device allocations (e.g. cudaMalloc) and copy data to device (e.g. cudaMemcpy), etc.

A limited amount of documentation for npp calls is available at the usual place and there are CUDA sample codes that demonstrate some examples of npp library call utilization.

For questions that are not addressed by those resources, you may also want to look at intel ipp documentation. The npp routines in many cases closely mimic intel ipp functionality, so you may get some insight there. Here is an example doc.

Also check the return values of any appropriate CUDA or npp calls, and you can also run your codes with cuda-memcheck to get hints about what may be going wrong.