Search code examples
c++arraysperformancebenchmarkingmicrobenchmark

Why is the printing of array elements in average slower than the printing of single objects in C++?


I made a test to see the differences between accessing array elements and single objects by printing their values into the CLI:

#include <iostream>
#include <chrono>
#include <iomanip>

int main()
{
    int a[10] = {1,2,3,4,5,6,7,8,9,10};

    int v1 = 1;
    int v2 = 2;
    int v3 = 3;
    int v4 = 4;
    int v5 = 5;
    int v6 = 6;
    int v7 = 7;
    int v8 = 8;
    int v9 = 9;
    int v10 = 10;

    std::cout << "Array output:" << std::endl << std::endl;

    auto t_start1 = std::chrono::high_resolution_clock::now();

    std::cout << "1. value: " << a[0] << std::endl;
    std::cout << "2. value: " << a[1] << std::endl;
    std::cout << "3. value: " << a[2] << std::endl;
    std::cout << "4. value: " << a[3] << std::endl;
    std::cout << "5. value: " << a[4] << std::endl;
    std::cout << "6. value: " << a[5] << std::endl;
    std::cout << "7. value: " << a[6] << std::endl;
    std::cout << "8. value: " << a[7] << std::endl;
    std::cout << "9. value: " << a[8] << std::endl;
    std::cout << "10. value: " << a[9] << std::endl;

    auto t_end1 = std::chrono::high_resolution_clock::now();

    std::cout << std::endl;

    std::cout << "Variable output:" << std::endl << std::endl;

    auto t_start2 = std::chrono::high_resolution_clock::now();

    std::cout << "1. value: " << v1 << std::endl;
    std::cout << "2. value: " << v2 << std::endl;
    std::cout << "3. value: " << v3 << std::endl;
    std::cout << "4. value: " << v4 << std::endl;
    std::cout << "5. value: " << v5 << std::endl;
    std::cout << "6. value: " << v6 << std::endl;
    std::cout << "7. value: " << v7 << std::endl;
    std::cout << "8. value: " << v8 << std::endl;
    std::cout << "9. value: " << v9 << std::endl;
    std::cout << "10. value: " << v10 << std::endl;


    auto t_end2 = std::chrono::high_resolution_clock::now();

    std::cout<< std::endl << "Time passed with array: "
              << std::chrono::duration<double, std::milli>(t_end1-t_start1).count()
              << " ms\n" << std::endl;
    std::cout<< std::endl << "Time passed with variables: "
              << std::chrono::duration<double, std::milli>(t_end2-t_start2).count()
              << " ms\n" << std::endl;

    return 0;
}

On the first implementation (MingW/g++ under Windows 10,cmd.exe), the printing of values inside array elements is in average 3 milliseconds slower than using single scalar objects:

Table for Windows, g++/MingW:

                Array Elements:            Single Objects:   

1. Run          13.9609 ms                 9.529 ms
2. Run          11.9031 ms                 8.0936 ms
3. Run          13.3706 ms                 9.5264 ms
4. Run          12.5302 ms                 8.4723 ms
5. Run          14.4679 ms                 9.9688 ms
6. Run          12.3989 ms                 8.4326 ms
7. Run          12.8719 ms                 10.1851 ms
8. Run          10.9138 ms                 7.4481 ms
9. Run          12.8971 ms                 9.4094 ms
10. Run         11.9045 ms                 7.9391 ms
11. Run          9.9192 ms                 8.4047 ms
12. Run         13.4106 ms                 10.0296 ms

On the second implementation (g++ under Linux Ubuntu), the printing of values inside array elements is 3 microseconds in average slower than using single scalar objects:

Table for Linux Ubuntu, g++:

                Array Elements:            Single Objects:   

1. Run          0.013 ms                   0.008 ms
2. Run          0.012 ms                   0.007 ms
3. Run          0.013 ms                   0.008 ms
4. Run          0.014 ms                   0.009 ms
5. Run          0.012 ms                   0.008 ms
6. Run          0.013 ms                   0.008 ms
7. Run          0.013 ms                   0.009 ms
8. Run          0.014 ms                   0.009 ms
9. Run          0.012 ms                   0.008 ms
10. Run         0.013 ms                   0.009 ms
11. Run         0.012 ms                   0.009 ms
12. Run         0.012 ms                   0.008 ms 

My Question:

  • Why is the printing of values in array elements in average slower than the printing of values in single objects in *C++?

*Info: I do not know if the printing of array elements is in general in average slower, independent of a specific language.


Solution

  • If you take a look at the generated assembly you can notice that the compiler replaces loads of array elements and scalars with constants because their values are known at compile time.


    You measure the time it takes to format and output the elements of the array, as well as flush the output stream on each element (with std::endl), which involves a syscall.

    Loading an element from an array is much shorter than that. Loading a CPU register from L1 cache takes 4 CPU cycles (less than 1 nanosecond on a 5GHz CPU), from memory (last level cache miss) on i9-i9900KS ~215 CPU cycles, on Ryzen ~280 CPU cycles.

    There should be no measurable difference between loading an element from an array or from a (scalar) variable. Loading an element from an array may involve a bit of index arithmetic in the generated assembly, but that may be hardly measurable.


    When I put a loop around your timings to let the CPU increase its frequency to its max, page fault pages in and warm-up CPU cache; and replace std::endl with '\n' I get the following timings:

    Time passed with array:     0.029154 ms
    Time passed with variables: 0.029286 ms
    
    Time passed with array:     0.027148 ms
    Time passed with variables: 0.027587 ms
    

    This is to show that there is no measurable time difference between accessing array elements and scalar variables (but it is still measuring std::cout times).