I am writing a cuda kernel to convert rgba image to gray scale image in pycuda, here is the PyCUDA code:
import numpy as np
import matplotlib.pyplot as plt
import pycuda.autoinit
import pycuda.driver as cuda
from pycuda.compiler import SourceModule
kernel = SourceModule("""
#include <stdio.h>
__global__ void rgba_to_greyscale(const uchar4* const rgbaImage,
unsigned char* const greyImage,
int numRows, int numCols)
{
int y = threadIdx.y+ blockIdx.y* blockDim.y;
int x = threadIdx.x+ blockIdx.x* blockDim.x;
if (y < numCols && x < numRows) {
int index = numRows*y +x;
uchar4 color = rgbaImage[index];
unsigned char grey = (unsigned char)(0.299f*color.x+ 0.587f*color.y +
0.114f*color.z);
greyImage[index] = grey;
}
}
""")
However, the problem is how to relate uchar4* to numpy array. I know can modify my kernel function to accept int* or float*, and make it work. But I just wonder how to make the above kernel function to work in pycuda.
Below is host code.
def gpu_rgb2gray(image):
shape = image.shape
n_rows, n_cols, _ = np.array(shape, dtype=np.int)
image_gray = np.empty((n_rows, n_cols), dtype= np.int)
## HERE is confusing part, how to rearrange image to match unchar4* ??
image = image.reshape(1, -1, 4)
# Get kernel function
rgba2gray = kernel.get_function("rgba_to_greyscale")
# Define block, grid and compute
blockDim = (32, 32, 1) # 1024 threads in total
dx, mx = divmod(shape[1], blockDim[0])
dy, my = divmod(shape[0], blockDim[1])
gridDim = ((dx + (mx>0)), (dy + (my>0)), 1)
# Kernel function
# HERE doesn't work because of mismatch
rgba2gray (
cuda.In(image), cuda.Out(image_gray), n_rows, n_cols,
block=blockDim, grid=gridDim)
return image_gray
Anyone have any ideas? Thanks!
The gpuarray
class has native support for CUDA's built in vector types (including uchar4
).
So you can create as gpuarray instance with the correct dtype for the kernel, and copy the host image to that gpuarray using buffers, then use the gpuarray as the kernel input argument. As an example (and if I understood your code correctly), something like this should probably work:
import pycuda.gpuarray as gpuarray
....
def gpu_rgb2gray(image):
shape = image.shape
image_rgb = gpuarray.empty(shape, dtype=gpuarray.vec.uchar4)
cuda.memcpy_htod(image_rgb.gpudata, image.data)
image_gray = gpuarray.empty(shape, dtype=np.uint8)
# Get kernel function
rgba2gray = kernel.get_function("rgba_to_greyscale")
# Define block, grid and compute
blockDim = (32, 32, 1) # 1024 threads in total
dx, mx = divmod(shape[1], blockDim[0])
dy, my = divmod(shape[0], blockDim[1])
gridDim = ((dx + (mx>0)), (dy + (my>0)), 1)
rgba2gray ( image_rgb, image_gray, np.int32(shape[0]), np.int32(shape[1]), block=blockDim, grid=gridDim)
img_gray = np.array(image_gray.get(), dtype=np.int)
return img_gray
this would take an image of 32 bit unsigned integers and copy them to an array of uchar4
on the GPU and then upcast the resulting array of uchar
back to integers on the device.