c++sorting c++11 optimization pass-by-reference

Passing by reference to comparator function (C++11)

I am trying to speed up my code (minimal, reproducible example below) and I was told that passing by reference would be a more efficient method for my comparator function. That was the first time I heard of the phrase, so I looked it up and found some websites with examples, but I don't understand when and how to use it. How would I use it in this case?

#include <array>
#include <iostream>
#include <algorithm>
#include <fstream>
#include <ctime>
#include <random>

using namespace std;

class arrMember {
public:
    int var;
    arrMember(int var) :
        var(var) {}
    arrMember() {};
};

array<int, 1000000> arraySource;

array<arrMember, 1000000> arrayObjects;

bool compare(arrMember(x), arrMember(y)) {
    return (x.var < y.var);
}

void arrayPrint() {
    ofstream output("arrayPrint.txt");
    for (int k = 0; k != arrayObjects.size(); k++) {
        output << arrayObjects[k].var << endl;
    }

    output.close();
}

void sort() {
    int j = 0;
    for (auto i = arraySource.begin(); i != arraySource.end(); i++, j++) {
        arrayObjects[j] = arrMember(arraySource[j]);
    }

    sort(arrayObjects.begin(), arrayObjects.end(), compare);
}

int main(){
    random_device rd{};
    mt19937 engine{ rd() };
    uniform_int_distribution<int> dist{ 0, 9999 };
    for (int x = 0; x < arraySource.size(); ++x){
        arraySource[x] = dist(engine);
    }

    cout << "Executing sort...\n";
    clock_t t1 = clock();
    sort();
    clock_t t2 = clock();
    double timeDiff = ((double)(t2 - t1)) / CLOCKS_PER_SEC;

    cout << "Sort finished. CPU time was " << timeDiff << " seconds.\n";

    arrayPrint();

    return 0;
}

Thanks for the help.

Solution

For trivially small types, passing by reference doesn't help; copy-constructing a class consisting of a single int is basically the same cost as taking the address of an existing instance, and copy constructing means the comparator doesn't need to dereference a pointer to find the value, it's already on the local stack.

For a larger type with expensive copy-construction, you could change (original code minus unnecessary parens):

bool compare(arrMember x, arrMember y) {
    return x.var < y.var;
}

to:

bool compare(const arrMember& x, const arrMember& y) {
    return x.var < y.var;
}

and harvest a meaningful speed-up, but for a simple int wrapper class, you gain nothing.

What will make a difference, regardless of the size of the class in question, is replacing raw functions with functors (or lambdas, which are syntactic sugar for functors). std::sort specializes the template to the type of the comparator, and functions themselves aren't types; they're effectively instances of the set of functions that share the same prototype. So if you implement both:

bool compare(const arrMember& x, const arrMember& y) {
    return x.var < y.var;
}
bool compareReverse(const arrMember& x, const arrMember& y) {
    return x.var > y.var;
}

and call std::sort with both compare and compareReverse at different points in your program, it only makes one specialization of std::sort for bool (*)(const arrMember&, const arrMember&), and that single specialization must actually pass around and call the provided function through a pointer; the cost of calling a function at all is significantly higher than the trivial cost of performing the comparison itself, and calling through a pointer is usually even more expensive.

By contrast, functors (and lambdas) are unique types, so std::sort can fully specialize to them, including inlining the comparison, so no call to the comparator function ever occurs; the comparator logic is inlined directly into a unique specialization of std::sort. So instead of:

bool compare(const arrMember& x, const arrMember& y) {
    return x.var < y.var;
}
std::sort(..., compare);

you could do this:

struct compare {
    bool operator()(const arrMember& x, const arrMember& y) const {
        return x.var < y.var;
    }
};
std::sort(..., compare());

or inline the whole thing as a C++11 lambda:

std::sort(..., [](const arrMember& x, const arrMember& y) { return x.var < y.var; });

Either way, the code will run faster; Godbolt shows that either function pointer approach is pretty much the same, while with the functor approach, you reduce runtime relative to the function pointer approach by about a third (saving a little more passing by value in this case, but hardly worth the trouble to think about most of the time; I'd default to passing by const reference, and only consider switching if profiling said it was slow, and the type was simple enough that passing by value might help).

This benefit to templates and functors is why C++'s std::sort reliably beats C's qsort when used appropriately; C lacks templates, so it can't specialize qsort at all, and must always call the comparator through a function pointer. If you use std::sort with a function, it's no real improvement on qsort, but used with a functor/lambda, it generates much faster code at the expense of generating more code (a unique specialization of std::sort for each functor/lambda). You could get the same benefit in C by copy-pasting the code implementing qsort, getting rid of the comparator and inlining the comparison yourself, but that's a lot of maintenance overhead; C++ templates do 99% of the work for you, you just need to remember to use functors/lambdas for your callbacks, not raw functions.