c++pointers parallel-processing openmp directshow

OPEN MP with For loop and pointer operation

Context

I am doing a directshow filter that changes the contrast and the brightness of every frames. The pointer to the first pixel of the first frame is : RGBTRIPLE *prgb = (RGBTRIPLE*) pData;

Also, int numPixels = cxImage * cyImage; is the number of pixels per frame.

The loop

#pragma omp parallel for
for (int iPixel=0; iPixel < numPixels; iPixel++ ) {

    prgb->rgbtGreen = prgb->rgbtGreen * _contrastPower + _brightnessPower;
    prgb->rgbtBlue  = prgb->rgbtBlue  * _contrastPower + _brightnessPower;
    prgb->rgbtRed   = prgb->rgbtRed   * _contrastPower + _brightnessPower;  

    if(prgb->rgbtGreen>255) prgb->rgbtGreen = 255;
    if(prgb->rgbtBlue>255)  prgb->rgbtBlue  = 255;
    if(prgb->rgbtRed>255)   prgb->rgbtRed   = 255;


    prgb++;
}

Problem

The output stream is uglyfied. Let's say we have to threads using the same pointer we increment, of course they are going to end up racing and causing weird problems.

Also, I tryed removing the int iPixel and using only the prgb*, but couldn't get the syntax down.

Question

Is it possible to make a parallel for loop while using pointer operations ? If so, how ?

Solution

The problem is that prgb is a shared pointer and incrementing it in each thread without any data protection leads to data races. Instead, your code should look like similar to this:

#pragma omp parallel for schedule(static)
for (int iPixel=0; iPixel < numPixels; iPixel++ ) {
   RGBTRIPLE *ppixel = prgb + iPixel;

   ppixel->rgbtGreen = ppixel->rgbtGreen * _contrastPower + _brightnessPower;
   ppixel->rgbtBlue  = ppixel->rgbtBlue  * _contrastPower + _brightnessPower;
   ppixel->rgbtRed   = ppixel->rgbtRed   * _contrastPower + _brightnessPower;  

   if(ppixel->rgbtGreen>255) ppixel->rgbtGreen = 255;
   if(ppixel->rgbtBlue>255)  ppixel->rgbtBlue  = 255;
   if(ppixel->rgbtRed>255)   ppixel->rgbtRed   = 255;
}

The algorithm is memory bound on modern CPUs, therefore do not expect performance to be linearly proportional to the number of threads if the image data does not fit entirely in the CPU cache.