Search code examples
c++performancetemplatesoverloadingoverload-resolution

Is T Min(T, T); always better than const T& Min(const T&, const T&); if sizeof(T) <= sizeof(void*)?


template<class T>
T Min(T a, T b)
{
    return a < b ? a : b;
}

template<class T>
const T& Min(const &T a, const T& b)
{
    return a < b ? a : b;
}

Which is better? Or, should I redefine the template function as follows:

template<class T, ENABLE_IF(sizeof(T) <= sizeof(void*))>
T Min(T a, T b)
{
    return a < b ? a : b;
}

template<class T, ENABLE_IF(sizeof(T) > sizeof(void*))>
const T& Min(const &T a, const T& b)
{
    return a < b ? a : b;
}

// Note: ENABLE_IF is a helper macro.

Solution

  • I can't say for sure, but given that

    template<class T>
    const T& Min(const &T a, const T& b)
    {
        return a < b ? a : b;
    }
    

    ..is basically replaced with (lets say for T = bool):

    const bool & Min(const bool & a, const bool & b)
    {
        return a < b ? a : b;
    }
    

    I would say it is a fair assumption that the compiler would pass the bool in the most efficient way possible. I.E, just because we're using & doesn't mean that it has to be passed by reference.

    Another couple of ideas: When a function is called, arguments are either pushed onto the stack, or they're passed via a register. The only way that

    bool Min(bool a, bool b)
    {
        return a < b ? a : b;
    }
    

    would be "better"/"faster" than

    const bool & Min(const bool & a, const bool & b)
    {
        return a < b ? a : b;
    }
    

    is if passing a bool was faster than passing a const bool & (Ignoring dereferencing at the start of the function). I can't see this being true, since unless its being pushed onto the stack (depends on your calling convention), registers are all at least the size of a pointer on the host architecture. (I.E, rax is 64-bit, eax is 32-bit)

    Further, I presume it would be easier for the compiler to inline, since (just from the function signature) we can be guaranteed that the function never locally modifies the values of a and b, and thus needs no space for them.

    For user defined types, there are two cases.

    1. The type fits in a register, and we can treat it just like a basic type.
    2. The type does not fit in a register and must be passed by reference (if we use const & T) or as a copy (if we use just T). Since copying a class invokes class constructors, const & T will probably be faster in every case.

    But, I'm really just speculating here. To check if there is a difference in bool vs const bool &, the best way would be to check for your specific compiler by outputting assembly and seeing if there is any difference.

    HTH.