c loops optimization conditional-statements

Optimisation of loops containing conditionals with expensive math functions

I have a loop like:

#pragma omp parallel for num_threads(threads) schedule(static)
for (int i = 0; i < ndata; i++) {
    result[i] = calculate(data[i]);
}

with (a simplified version of) the function calculate() being:

double calculate(double in) {
    if (in < LP) {
        out = c.b1 * in;
    } else if (in < SP) {
        out = c.a2 + c.b2 * pow((c.c2 + c.d2 * in), c.e2);
    } else if (in < HP) {
        out = c.a3 + c.b3 * pow((c.c3 + c.d3 * in), c.e3);
    } else {
        out = c.a4 + c.b4 * in;
    }
    return out;
}

All calculation variables are double. It's an image processing routine so ndata can be 3 x number of pixels, or for modern cameras ~1E8, and I'm trying to make the routine as responsive as possible. The calculation needed is either simple addition / multiplication or a more expensive call to pow(), depending on the subpixel value being processed. I've already done a lot of precalculation outside the loop and I'm using OpenMP to handle parallelising of the loop, but is there anything more I can do to optimise this? I'm guessing it won't auto-vectorise particularly well given that for n successive passes round the loop you might have a mix of pow() and simple calculations.

Solution

Consider using arrays in your struct instead of name-numbered members. That will allow you to do something like:

for (int i = 0; i < ndata; i++){
  size_t j = 0;
  j += (data[i] >= LP);
  j += (data[i] >= SP);
  j += (data[i] >= HP);
  result[i] = c.a[j] + c.b[j] * 
  pow((c.c[j] + c.d[j] * data[i]), c.e[j]);
}

Then just populate those arrays with 0.0f and 1.0f as appropriate to make the function work.

From there it's just a matter of optimizing a specialized pow function for inlining and vectorization. As a bonus this should operate in constant time as long as your pow function does but at the cost of possibly unnecessary calculations for a good portion of the data - whether it's worth it or not will depend on the data set.