Search code examples
c++cachingoptimization

reason why this code is considered optimized?


I am working on optimization of some code and came across this, could someone tell me why this piece of code is more 'optimized'

for (i = 0; i < 1000; i+=2){
    float var = numberOfEggs*arrayX[i] + arrayY[i];
    arrayY[i+1] =  var;
    arrayY[i+2] = numberOfEggs*arrayX[i+1] + var;
}

than this version?

for(long i = 0; i < 1000 ; ++i)
       arrayY[i+1] = numberOfEggs*arrayX[i] + arrayY[i];

any help is appreciated thank you!


Solution

  • The first example is performing two assignments per iteration. You can tell by the increment statement.

    This is called loop unrolling. By performing two assignments per iteration, you are removing half of the branches.

    Most processors don't like branch instructions. The processor needs to determine whether or not to reload the instruction cache (branch prediction). There are at least two branches per iteration. The first is for the comparison, the second is to loop back to the comparison.

    To experiment, try using 4 assignments per iteration, and profile.