Search code examples
cudanpp

NPP functions with scratch buffer doesn't fill output value


Some code where im trying find maximum:

// 1)
    // compute size of scratch buffer
    int nBufferSize;
    auto status = nppiMaxGetBufferHostSize_32f_C1R(size(img), &nBufferSize); 
    // status - No_Errors, nBufferSize - computed

// 2)
    // device memory allocation for scratch buffer
    Npp8u * pDeviceBuffer;
    auto res = cudaMalloc((void **)(&pDeviceBuffer), nBufferSize);
    // result - cudaSucces

//3 )
    // call nnp function 
    // where:
    // - img is npp::ImageNPP_32f_C1 from UtilNPP (npp pointer wrapper for memory management) 
    // - size(img) valid NppiSize value
    Npp32f max_ = 13;
    status = nppiMax_32f_C1R(img.data(), img.pitch(), size(img), pDeviceBuffer, &max_); 
    // status = No_Errors, but output value max_ not changed!

// 4)
    // free device memory for scratch buffer
    cudaFree(pDeviceBuffer)

All function return 0 (no errors). But output value max_ not calculated. Im try some other statistical functions who required scratch buffer and get same result. Im use CUDA 6.5 and my code like sample in NPP documentation about using function with scratch buffer Someone have any ideas?


Solution

  • nppiMax_32f_C1R and all other such variants require input and output memory pointers to be allocated on device. So max_ should be present on device. To make the above example work, you can do the following:

    Npp32f max_ = 13;
    
    Npp32f* d_max_; //Device output
    cudaMalloc(&d_max_, sizeof(Npp32f));
    
    status = nppiMax_32f_C1R(img.data(), img.pitch(), size(img), pDeviceBuffer, d_max_);
    
    cudaMemcpy(&max_, d_max_, sizeof(Npp32f), cudaMemcpyDeviceToHost);
    cudaFree(d_max_);