Search code examples
c++performancetheory

How can you factor out branching from tight loop?


My question is: How can I add features to my processing loop without the overhead of checking the true/falseness of the user settings to navigate the branches? The settings are the same for all iterations on the loop. Do modern processors with branch prediction make this unnecessary?

My programs over follow this pattern:

  1. User adjusts settings, checkboxes, sliders, numerical entries.
  2. Data is processed when an update is triggered
    1. Apply settings to local variables
    2. Run loop over a large dataset
      • add if statements to bypass unused code from the user settings.
      • return from loop
    3. return transformed data

How can you template or inline out all permutations ahead of time?

example:

bool setting1 = true;
bool setting2 = false;
vector<float> data;

for(int i=0;i<100000;i++)
    data.push_back(i);

for(int i=0;i<100000;i++) {
    if (setting1)
    {
       doStuff(data[i]);
       ....
    }
    if (setting2)
    {
       doMoreStuff(data[i]);
       .....
    }

    .... //etc
}

I know this is a silly example. But I'd like to know what pattern scales when there are lots of branches.


Solution

  • Use templates for the main loop.

    template <bool A, bool B>
    void loop() {
      while (1) {
          if (A) // will get compiled out if A == false
          {
             doStuff(data[i]);
             ....
          }
          if (B)
          {
             doMoreStuff(data[i]);
             .....
          }
    
          .... //etc
      }
    }
    

    When you change settings: (you could probably make this less code)

    if (setting1) {
      if (setting2)
        loop<1,1>;
      else 
        loop<1,0>;
    }
    else {
      if (setting2)
        loop<0,1>;
      else 
        loop<0,0>;
    }
    

    You want to stay in loop() until settings change.

    This should be used with care as it can lead to bloat.


    Profiled the answers (G++ O2 optimization):

     %Time
     46.15      0.84     0.84                             fb() (blackbear)
     38.37      1.53     0.69                             fa() (OP)
     16.13      1.82     0.29                             fc() (pubby8)