I have the following CUDA program which converts an image from RGBA into greyscale in parallel. I want to also have a version which will run sequentially which will allow me to compare the two and get metrics like speedup etc.
From my understanding, in order for this to run sequentially I need to edit in a way that will mean the image is stepped through pixel by pixel using two for loops (one for X, one for Y). The greyscale conversion should then be run on the pixel before moving onto the next one.
Whilst I have some idea of what I should be doing I'm not really sure where I should be editing the code and where to get started.
Edit: I now understand that it is the kernel itself I need to be editing in order to make my program sequential.
Shown below,
__global__ void colorConvert(unsigned char * grayImage, unsigned char * rgbImage, unsigned int width, unsigned int height)
{
unsigned int x = threadIdx.x + blockIdx.x * blockDim.x;
//unsigned int y = threadIdx.y + blockIdx.y * blockDim.y; //this is needed if you use 2D grid and blocks
//if ((x < width) && (y < height)) {
//check if out of bounds
if ((x < width*height)) {
// get 1D coordinate for the grayscale image
unsigned int grayOffset = x;// y*width + x; //this is needed if you use 2D grid and blocks
// one can think of the RGB image having
// CHANNEL times columns than the gray scale image
unsigned int rgbOffset = grayOffset*CHANNELS;
unsigned char r = rgbImage[rgbOffset]; // red value for pixel
unsigned char g = rgbImage[rgbOffset + 1]; // green value for pixel
unsigned char b = rgbImage[rgbOffset + 2]; // blue value for pixel
// perform the rescaling and store it
// We multiply by floating point constants
grayImage[grayOffset] = 0.21f*r + 0.71f*g + 0.07f*b;
}
}
I have removed the rest of my code from the question as there was a lot of it too look through. If I want to make this kernel run in a sequential way using two for loops to step through each pixel and apply the grayImage[grayOffset]
line of code to each one how would I go about doing it?
You need a for loop, with your code you use a 1D array for all the image pixels so you need only one for.
I think that the loop can be written like that, in a function that takes the same parameters as your kernel
for(x=0; x<width*height; ++x)
{
unsigned int grayOffset = x;
unsigned int rgbOffset = grayOffset*CHANNELS;
unsigned char r = rgbImage[rgbOffset]; // red value for pixel
unsigned char g = rgbImage[rgbOffset + 1]; // green value for pixel
unsigned char b = rgbImage[rgbOffset + 2]; // blue value for pixel
// perform the rescaling and store it
// We multiply by floating point constants
grayImage[grayOffset] = 0.21f*r + 0.71f*g + 0.07f*b;
}