c++vectorization boolean-logic boolean-expression boolean-operations

How to convert a series of if/else if/else if/ chain of relations into linear looping code

I have the core of an algorithm that I want to convert from essentially a series of if/else if/else if/else i/ chain about 20 deep to a loop that could be done in linear fashion. The conditionals are simple with one of the 4 possibilities (A[i] < B[j]), (A[i] <= B[j]), (A[i] > B[j]), (A[i] >= B[j]). How can I convert all of them to a single conditional. For example the chain could be something like this.

if (A[i+0] <  B[j+0]) break
if (A[i+1] <= B[j+1]) break
if (A[i+2] >  B[j+2]) break
if (A[i+3] >= B[j+3]) break
if (A[i+4] >= B[j+4]) break
....

Each conditional could be 1 of 4 possible, but I want to convert them all into a single set of steps without a case so that it could be done in a loop (or possibly in parallel with vector intrinsics)

// Given a list R[n] of 4 possible relations loop over all the data
int result = 1;
for (i = 0; i < num_relations && result; ++i) {           
       // How do I convert this to linear code which does the equivalent of
       // (the value of R[n] and what relation it maps is flexible, this is an example)
       case (R[n]) {
          0 : result = A[i] <  B[i]; break;
          1 : result = A[i] <= B[i]; break;
          2 : result = A[i] >  B[i]; break;
          3 : result = A[i] >= B[i]; break;
       }
}

Some properties for (unsigned numbers) that can be possibly used are

(A > B) ^ 1 === (A <= B) ^ 0

Can the above be optimized to something better than

result = 1;
for (i = 0; i < num_relations && result; ++i) {           
   result = ((A[i] <  B[i]) && (R[i] == 0)) ||
            ((A[i] <= B[i]) && (R[i] == 1)) ||
            ((A[i] >  B[i]) && (R[i] == 2)) ||
            ((A[i] >= B[i]) && (R[i] == 3));
}

Solution

Without vectorization, your if() sequence is as fast as it can get. In that case you must have one compare instruction per condition, you can't get around it (even though some machines can optimize the branches away except for one).

With vectorization, you can perform multiple comparisons in parallel, under the requirement that they are all in the same direction. But that can be achieved by transforming your input values:

int signs[] = {1, 1, -1, -1, -1, ...};
int equals[] = {0, 1, 0, 1, 1, ...};
if (A[i+0] <  signs[0]*B[j+0] + equals[0]) break;
if (A[i+1] <  signs[1]*B[j+1] + equals[1]) break;
if (A[i+2] <  signs[2]*B[j+2] + equals[2]) break;
if (A[i+3] <  signs[3]*B[j+3] + equals[3]) break;
if (A[i+4] <  signs[4]*B[j+4] + equals[4]) break;
...

However, vectorization of this code should fail because the compiler is required not to load A[i+1] from memory before the first condition is evaluated and shown not to be fulfilled. So you need to make the condition evaluation nondependent on each other:

int signs[] = {1, 1, -1, -1, -1, ...};
int equals[] = {0, 1, 0, 1, 1, ...};
int doBreak = 0;
doBreak |= (A[i+0] <  signs[0]*B[j+0] + equals[0]);
doBreak |= (A[i+1] <  signs[1]*B[j+1] + equals[1]);
doBreak |= (A[i+2] <  signs[2]*B[j+2] + equals[2]);
doBreak |= (A[i+3] <  signs[3]*B[j+3] + equals[3]);
doBreak |= (A[i+4] <  signs[4]*B[j+4] + equals[4]);
...
if(doBreak) break;

Now you are free to make a loop out of it.