I am doing a directshow filter that changes the contrast and the brightness of every frames. The pointer to the first pixel of the first frame is : RGBTRIPLE *prgb = (RGBTRIPLE*) pData;
Also, int numPixels = cxImage * cyImage;
is the number of pixels per frame.
#pragma omp parallel for
for (int iPixel=0; iPixel < numPixels; iPixel++ ) {
prgb->rgbtGreen = prgb->rgbtGreen * _contrastPower + _brightnessPower;
prgb->rgbtBlue = prgb->rgbtBlue * _contrastPower + _brightnessPower;
prgb->rgbtRed = prgb->rgbtRed * _contrastPower + _brightnessPower;
if(prgb->rgbtGreen>255) prgb->rgbtGreen = 255;
if(prgb->rgbtBlue>255) prgb->rgbtBlue = 255;
if(prgb->rgbtRed>255) prgb->rgbtRed = 255;
prgb++;
}
The output stream is uglyfied. Let's say we have to threads using the same pointer we increment, of course they are going to end up racing and causing weird problems.
Also, I tryed removing the int iPixel and using only the prgb*, but couldn't get the syntax down.
Is it possible to make a parallel for loop while using pointer operations ? If so, how ?
The problem is that prgb
is a shared pointer and incrementing it in each thread without any data protection leads to data races. Instead, your code should look like similar to this:
#pragma omp parallel for schedule(static)
for (int iPixel=0; iPixel < numPixels; iPixel++ ) {
RGBTRIPLE *ppixel = prgb + iPixel;
ppixel->rgbtGreen = ppixel->rgbtGreen * _contrastPower + _brightnessPower;
ppixel->rgbtBlue = ppixel->rgbtBlue * _contrastPower + _brightnessPower;
ppixel->rgbtRed = ppixel->rgbtRed * _contrastPower + _brightnessPower;
if(ppixel->rgbtGreen>255) ppixel->rgbtGreen = 255;
if(ppixel->rgbtBlue>255) ppixel->rgbtBlue = 255;
if(ppixel->rgbtRed>255) ppixel->rgbtRed = 255;
}
The algorithm is memory bound on modern CPUs, therefore do not expect performance to be linearly proportional to the number of threads if the image data does not fit entirely in the CPU cache.