Search code examples
c++opencvcudathrust

cv::gpu::GpuMat constructor fails if there is a thrust::reduce() call LATER in the code


I have a VS 2013 project where I use (somewhat outdated) OpenCV 2.4.9 and CUDA 7.5. What I discovered is that if a code contains some - but not all - thrust calls (thrust::reduce() in particular), then OpenCV GPU code stops working even though it executes BEFORE any thrust calls. In particular, cv::gpu::GpuMat() fails inside cudaMallocPitch call with access violation on NULL location. I'd like to know if I'm missing something before I urge everyone to upgrade to the latest OpenCV version. (Which might or might not help anyway.)

This is a more or less minimal code to reproduce the error:

// main.cu
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <opencv2/gpu/gpu.hpp>
#include <thrust/device_vector.h>
#include <thrust/transform.h>
#include <thrust/reduce.h>
#include <thrust/functional.h>

#include <stdio.h>

int main()
{
    const int arraySize = 5;
    float fc[arraySize] = { 0 };
    float* dev_c;

    cv::Mat m = cv::Mat::eye(100,100,CV_32F);
    cv::gpu::GpuMat g(m);

    cudaMalloc((void**)&dev_c, arraySize * sizeof(int));
    cudaMemcpy(dev_c, fc, arraySize * sizeof(int), cudaMemcpyHostToDevice);
    thrust::device_ptr<float> dev_ptr = thrust::device_pointer_cast(dev_c);
    // the line below works fine
    thrust::transform(dev_ptr, dev_ptr + arraySize, dev_ptr, dev_ptr, thrust::multiplies<float>());
    // the line below causes cv::gpu::GpuMat to crash, but the program works if it is commented
    float sum2 = thrust::reduce(dev_ptr, dev_ptr + arraySize, 0, thrust::plus<float>());
    cudaFree(dev_c);
}

Solution

  • Wow, I decided to study the project settings, and by default CUDA code generation is set to compute_20,sm_20. I tried to change it to compute_50,sm_50, as I'm using GTX 750 Ti and OpenCV is also compiled with CUDA_ARCH_BIN set to 5.0, and now everything works.