Search code examples
opencvimage-processingcudagpu

Cuda Kernel Code doesn't cover all the image


I'm trying to equalize a histogram by writing cuda kernel code. The main problem is that the operation seems correct up to a certain line of the image, after which the output is wrong, but I can't understand why, below is the code and images.

Expected output:

Expected output

Output obtained:

Output obtained

I can't understand why, I have already checked that the histogram contains the exact values, in fact the expected output is obtained by performing the operation on the CPU and I have no problems, as soon as I pass the "Output Obtained" image to the GPU is the result. Now I show the code

Cuda Kernel Code

__global__ void equalizeHistCUDA(unsigned char* input, unsigned char* output, int *cumulative_hist, int cols, int rows) {
    int nGrayLevels = 256, area = cols*rows;
    int i = blockIdx.y * blockDim.y + threadIdx.y;
    int j = blockIdx.x * blockDim.x + threadIdx.x;
    if (i < rows && j < cols){  
        int pixelValue = input[i * cols + j];
        output[i * cols + j] = static_cast<uchar>((static_cast<double>(nGrayLevels) / area) * cumulative_hist[pixelValue]);
    }
}

Part of the main

nThreadPerBlocco = dim3(4, 3);
    numBlocks.y = gpu_resizedImage.rows / nThreadPerBlocco.y + ((gpu_resizedImage.rows % nThreadPerBlocco.y) == 0 ? 0 : 1);
    numBlocks.x = gpu_resizedImage.cols / nThreadPerBlocco.x + ((gpu_resizedImage.cols % nThreadPerBlocco.x) == 0 ? 0 : 1);
    cv::cuda::GpuMat gpu_equalizedImage(600,600,CV_8UC1);
equalizeHistCUDA<<<numBlocks,nThreadPerBlocco>>>(gpu_resizedImage.data,gpu_equalizedImage.data,cumHist_device,gpu_resizedImage.cols,gpu_resizedImage.rows);
    cudaDeviceSynchronize();
    cudaError_t cudaErr = cudaGetLastError();
    if (cudaErr != cudaSuccess)
        fprintf(stderr, "Errore CUDA: %s\n", cudaGetErrorString(cudaErr));

    cv::Mat img;
    gpu_equalizedImage.download(img);
    cv::imwrite("test.jpg",img);

I have already checked that all the structures I use are correctly filled. Any suggestions?


Solution

  • Apparently OpenCV x CUDA does not allocate images on gpu contiguously so my CUDA kernel code computed addresses in linear memory, but this was not useful since OpenCV does not store as it stores on cpu. The solution is to use this to allocate the memory on the GPU:

    gpu_equalizedImageSM = cv::cuda::createContinuous(gpu_resizedImage.rows,
                               gpu_resizedImage.cols,
                               CV_8UC1);
    
    

    Having done this it is possible to access as I did in the kernel.