I was reading this question Why is it faster to process a sorted array than an unsorted array? and the best answer provided by Mysticial. The answer does a great job of explaining what is happening and why and also says this:
So what can be done?
If the compiler isn't able to optimize the branch into a conditional move, you can try some hacks if you are willing to sacrifice readability for performance.
Replace:
if (data[c] >= 128)
sum += data[c];
with:
int t = (data[c] - 128) >> 31;
sum += ~t & data[c];
This eliminates the branch and replaces it with some bitwise operations.
What exactly is this code doing and why is it equivalent?
It first converts the comparison to subtraction with 128.
Then the sign of the result (whether subtracting went below 128) is expanded to either all zeroes or all ones, which is and
ed to the value being added, zeroing it out if the result of the subtraction was negative.