Search code examples
coptimizationpipelinebranch-prediction

Turning loop into arithmetics to speed up function


Hi I am on my way to optimize a function that is supposed to give me the "next" of something. So far what I have got is

  int fun(int a){
    const int k = ...;
    for(;test_value(a++) != k;);
    return a;
   }

This was a quick and dirty way to test that my algorithm actually worked but now afterwards I am worried that the loop makes a test for branching on every iteration (if not the compiler is very good at handling it behind the scenes?). Let us say the chance that any a fulfils the test is at most 1/5 and worst cases one in a million but that test_value is just a clock cycle or two. Is there some systematic way I can help my compiler trade all the branches with arithmetics to better utilize the CPU pipelines?


Solution

  • You could "unroll" your loop a bit, something like:

    int nomatch = 1;
    while( nomatch ){
        nomatch   = (test_value(a++) != k);
        nomatch &&= (test_value(a++) != k);
        nomatch &&= (test_value(a++) != k);
        nomatch &&= (test_value(a++) != k);
        nomatch &&= (test_value(a++) != k);
    }
    

    This would yield fewer iterations, and short-circuiting would prevent the evaluations of test_value once a match has been found.

    Like your original code, this assumes that a match will be found at some point.