Search code examples
c++algorithmloopsinlinecompiler-optimization

How to repeat a code segment without the use of function or class for high performance loop in C++


My program in C++11 is performing an online processing of serialized data and the loop need to run over millions of memory positions. Efficiency on the computation is a must and my concern is that by calling a function or a class within such a loop will create unnecessary operations that will affect efficiency, e.g. transfer of several pointer values needed for the operation between different variable scopes.

To exemplify, let's consider the following dummy example where "something" is the operation that is repeated. Please note that the code within "something" uses the variables within the loop scope.

do {
    something(&span,&foo);
    spam++
    foo++
    if ( spam == spam_spam ) {
      something(&span,&foo);
      other_things(&span,&foo);
      something(&span,&foo);
    }
    else {
      something(&span,&foo);
      still_other_things(&span,&foo);
      something(&span,&foo);
    }
}
while (foo<bar);

Is there a way to repeat a code block and avoid variables being moved and copied around using unnecessary operations? Does the use of functions and classes on such loops actually implies on additional operations and how one could avoid it?


Update

As suggested, I ran some tests with the code presented below. I tested several options on how to call a simple increment 100 million times. I'm using GCC over RHEL 7 Server 7.6 on a x86_64 virtual machine under Hyper-V.

Initially, compiling with "g++ -std=c++17 -o test.o test.cpp"

  • Simple loop computation (baseline): 211.046ms

  • Inline Function: 468.768ms

  • Lambda Function: 253.466ms

  • Define Macro: 211.995ms

  • Function passing values: 466.986ms

  • Function passing pointers: 344.646ms

  • Function with void: 190.557ms

  • Object method operating with members: 231.458ms

  • Object method passing values: 227.615ms

From these results I realized that the compiler was not taking the inline suggestion, even after trying to bulge it to do it as suggested at g++ doesn't inline functions

Later, as suggested on the answer from Mat on the same post I turned on the compiler optimizations using "g++ -std=c++17 -O2 -o test.o test.cpp" and got the following result for the same the number of iterations compared with the test without optimization.

  • Simple loop computation (baseline): 62.9254ms

  • Inline Function: 65.0564ms

  • Lambda Function: 32.8637ms

  • Define Macro: 63.0299ms

  • Function passing values: 64.2876ms

  • Function passing pointers: 63.3416ms

  • Function with void: 32.1073ms

  • Object method operating with members: 63.3847ms

  • Object method passing values: 62.5151ms

Conclusion to this point:

  • inline function is not good alternative because one can not be sure on how the compiler will really take it and the result might be as bad as using a standard function.

  • "define macros" and "lambda functions" are better alternatives to inline. Each has its benefits and features, #define being more flexible.

  • using object members and methods seams a good balance to solve the problem on any situation while maintaining the code in a form that is easier to maintain and optimize.

  • tweaking the compiler is worth it;

Follows the code used for testing:

// Libraries
    #include <iostream>
    #include <cmath>
    #include <chrono>

// Namespaces
    using namespace std;
    using namespace std::chrono;

// constants that control program behaviour
    const long END_RESULT = 100000000;
    const double AVERAGING_LENGTH = 40.0;
    const int NUMBER_OF_ALGORITHM = 9;
    const long INITIAL_VALUE = 0;
    const long INCREMENT = 1;

// Global variables used for test with void function and to general control of the program;
    long global_variable;
    long global_increment;

// Function that returns the execution time for a simple loop
int64_t simple_loop_computation(long local_variable, long local_increment) {
    // Starts the clock to measure the execution time for the baseline
        high_resolution_clock::time_point timer_start = high_resolution_clock::now();

    // Perform the computation for baseline
        do {
            local_variable += local_increment;
        } while ( local_variable != END_RESULT);

    // Stop the clock to measure performance of the silly version
        high_resolution_clock::time_point timer_stop = high_resolution_clock::now();

        return(duration_cast<microseconds>( timer_stop - timer_start ).count());
}

// Functions that computes the execution time when using inline code within the loop
inline long increment_variable() __attribute__((always_inline));
inline long increment_variable(long local_variable, long local_increment) {
    return local_variable += local_increment;
}

int64_t inline_computation(long local_variable, long local_increment) {
    // Starts the clock to measure the execution time for the baseline
        high_resolution_clock::time_point timer_start = high_resolution_clock::now();

    // Perform the computation for baseline
        do {
            local_variable = increment_variable(local_variable,local_increment);
        } while ( local_variable != END_RESULT);

    // Stop the clock to measure performance of the silly version
        high_resolution_clock::time_point timer_stop = high_resolution_clock::now();

        return duration_cast<microseconds>( timer_stop - timer_start ).count();
}

// Functions that computes the execution time when using lambda code within the loop
int64_t labda_computation(long local_variable, long local_increment) {
    // Starts the clock to measure the execution time for the baseline
        high_resolution_clock::time_point timer_start = high_resolution_clock::now();

    // define lambda function
        auto lambda_increment = [&] {
            local_variable += local_increment;
        };

    // Perform the computation for baseline
        do {
            lambda_increment();
        } while ( local_variable != END_RESULT);

    // Stop the clock to measure performance of the silly version
        high_resolution_clock::time_point timer_stop = high_resolution_clock::now();

        return duration_cast<microseconds>( timer_stop - timer_start ).count();
}

// define lambda function
    #define define_increment() local_variable += local_increment;

// Functions that computes the execution time when using lambda code within the loop
int64_t define_computation(long local_variable, long local_increment) {
    // Starts the clock to measure the execution time for the baseline
        high_resolution_clock::time_point timer_start = high_resolution_clock::now();

    // Perform the computation for baseline
        do {
            define_increment();
        } while ( local_variable != END_RESULT);

    // Stop the clock to measure performance of the silly version
        high_resolution_clock::time_point timer_stop = high_resolution_clock::now();

        return duration_cast<microseconds>( timer_stop - timer_start ).count();
}
// Functions that compute the execution time when calling a function within the loop passing variable values
long increment_with_values_function(long local_variable, long local_increment) {
    return local_variable += local_increment;
}

int64_t function_values_computation(long local_variable, long local_increment) {
    // Starts the clock to measure the execution time for the baseline
        high_resolution_clock::time_point timer_start = high_resolution_clock::now();

    // Perform the computation for baseline
        do {
            local_variable = increment_with_values_function(local_variable,local_increment);
        } while ( local_variable != END_RESULT);

    // Stop the clock to measure performance of the silly version
        high_resolution_clock::time_point timer_stop = high_resolution_clock::now();

        return duration_cast<microseconds>( timer_stop - timer_start ).count();
}
// Functions that compute the execution time when calling a function within the loop passing variable pointers
long increment_with_pointers_function(long *local_variable, long *local_increment) {
    return *local_variable += *local_increment;
}

int64_t function_pointers_computation(long local_variable, long local_increment) {
    // Starts the clock to measure the execution time for the baseline
        high_resolution_clock::time_point timer_start = high_resolution_clock::now();

    // Perform the computation for baseline
        do {
            local_variable = increment_with_pointers_function(&local_variable,&local_increment);
        } while ( local_variable != END_RESULT);

    // Stop the clock to measure performance of the silly version
        high_resolution_clock::time_point timer_stop = high_resolution_clock::now();

        return duration_cast<microseconds>( timer_stop - timer_start ).count();
}
// Functions that compute the execution time when calling a function within the loop without passing variables 
void increment_with_void_function(void) {
    global_variable += global_increment;
}

int64_t function_void_computation(long local_variable, long local_increment) {
    // Starts the clock to measure the execution time for the baseline
        high_resolution_clock::time_point timer_start = high_resolution_clock::now();

    // set global variables
        global_variable = local_variable;
        global_increment = local_increment;

    // Perform the computation for baseline
        do {
            increment_with_void_function();
        } while ( global_variable != END_RESULT);

    // Stop the clock to measure performance of the silly version
        high_resolution_clock::time_point timer_stop = high_resolution_clock::now();

        return duration_cast<microseconds>( timer_stop - timer_start ).count();
}
// Object and Function that compute the duration when using a method of the object where data is stored without passing variables
struct object {
    long object_variable = 0;
    long object_increment = 1;

    object(long local_variable, long local_increment) {
        object_variable = local_variable;
        object_increment = local_increment;
    }

    void increment_object(void){
        object_variable+=object_increment;
    }

    void increment_object_with_value(long local_increment){
        object_variable+=local_increment;
    }
};

int64_t object_members_computation(long local_variable, long local_increment) {
    // Starts the clock to measure the execution time for the baseline
        high_resolution_clock::time_point timer_start = high_resolution_clock::now();

    // Create object
        object object_instance = {local_variable,local_increment};

    // Perform the computation for baseline
        do {
            object_instance.increment_object();
        } while ( object_instance.object_variable != END_RESULT);

    // Get the results out of the object
        local_variable = object_instance.object_variable;

    // Stop the clock to measure performance of the silly version
        high_resolution_clock::time_point timer_stop = high_resolution_clock::now();

        return duration_cast<microseconds>( timer_stop - timer_start ).count();
}

// Function that compute the duration when using a method of the object where data is stored passing variables
int64_t object_values_computation(long local_variable, long local_increment) {
    // Starts the clock to measure the execution time for the baseline
        high_resolution_clock::time_point timer_start = high_resolution_clock::now();

    // Create object
        object object_instance = {local_variable,local_increment};

    // Perform the computation for baseline
        do {
            object_instance.increment_object_with_value(local_increment);
        } while ( object_instance.object_variable != END_RESULT);

    // Get the results out of the object
        local_variable = object_instance.object_variable;

    // Stop the clock to measure performance of the silly version
        high_resolution_clock::time_point timer_stop = high_resolution_clock::now();

        return duration_cast<microseconds>( timer_stop - timer_start ).count();
}

int main() {

    // Create array to store execution time results for all tests
        pair<string,int64_t> duration_sum[NUMBER_OF_ALGORITHM]={
            make_pair("Simple loop computation (baseline): ",0.0),
            make_pair("Inline Function: ",0.0),
            make_pair("Lambda Function: ",0.0),
            make_pair("Define Macro: ",0.0)
            make_pair("Function passing values: ",0.0),
            make_pair("Function passing pointers: ",0.0),
            make_pair("Function with void: ",0.0),
            make_pair("Object method operating with members: ",0.0),
            make_pair("Object method passing values: ",0.0),
        };

    // loop to compute average of several execution times
        for ( int i = 0; i < AVERAGING_LENGTH; i++) {
            // Compute the execution time for a simple loop as the baseline
                duration_sum[0].second = duration_sum[0].second + simple_loop_computation(INITIAL_VALUE, INCREMENT);

            // Compute the execution time when using inline code within the loop (expected same as baseline)
                duration_sum[1].second = duration_sum[1].second + inline_computation(INITIAL_VALUE, INCREMENT);

            // Compute the execution time when using lambda code within the loop (expected same as baseline)
                duration_sum[2].second = duration_sum[2].second + labda_computation(INITIAL_VALUE, INCREMENT);

            // Compute the duration when using a define macro
                duration_sum[3].second = duration_sum[3].second + define_computation(INITIAL_VALUE, INCREMENT);

            // Compute the execution time when calling a function within the loop passing variables values
                duration_sum[4].second = duration_sum[4].second + function_values_computation(INITIAL_VALUE, INCREMENT);

            // Compute the execution time when calling a function within the loop passing variables pointers
                duration_sum[5].second = duration_sum[5].second + function_pointers_computation(INITIAL_VALUE, INCREMENT);

            // Compute the execution time when calling a function within the loop without passing variables
                duration_sum[6].second = duration_sum[6].second + function_void_computation(INITIAL_VALUE, INCREMENT);

            // Compute the duration when using a method of the object where data is stored without passing variables
                duration_sum[7].second = duration_sum[7].second + object_members_computation(INITIAL_VALUE, INCREMENT);

            // Compute the duration when using a method of the object where data is stored passing variables
                duration_sum[8].second = duration_sum[8].second + object_values_computation(INITIAL_VALUE, INCREMENT);
        }


        double average_baseline_duration = 0.0;

    // Print out results
        for ( int i = 0; i < NUMBER_OF_ALGORITHM; i++) {
        // compute averave from sum
            average_baseline_duration = ((double)duration_sum[i].second/AVERAGING_LENGTH)/1000.0;

        // Print the result
            cout << duration_sum[i].first << average_baseline_duration << "ms \n";
        }

    return 0;
}

Solution

  • If the code is short enough, it can be declared inline and the compiler will put it inline. If it's not, well, then it probably won't help to repeat it.

    But, honestly, this is the least effective form of optimisation anyway. Concern yourself with efficient algorithms and cache efficient data structures.