The reasoning behind Clang's implementation of std::function's move semantics

In libc++'s implementation of std::function, if the function object whose type is being erased is small enough to fit inside an SBO then the move operation will copy it, not move it. Yet not every object whose stack memory footprint is small is optimal to be copied. Why copy rather than move?

Consider this example with Clang (using shared_ptr because it has reference counting):

https://wandbox.org/permlink/9oOhjigTtOt9A8Nt

The semantics in test1() is identical to that of test3() where an explicit copy is used. shared_ptr helps us to see that.

On the other hand, GCC behaves reasonably and predictably:

https://wandbox.org/permlink/bYUDDr0JFMi8Ord6

Both are allowed by the standard. std::function requires functions to be copyable, a moved-from object is left in unspecified state, and so on. Why do that? The same reasoning may be applied to std::map: if both the key and value are copyable, then why not make a new copy whenever someone std::moves a std::map? That would also be within the standard's requirements.

According to cppreference.com there should be a move and it should be the target.

The example:

#include <iostream>
#include <memory>
#include <functional>
#include <array>
#include <type_traits>

void test1()
{
    /// Some small tiny type of resource. Also, `shared_ptr` is used because it has a neat
    /// `use_count()` feature that will allow us to see what's going on behind the 'curtains'.
    auto foo = std::make_shared<int>(0);

    /// Foo is not actually a trivially copyable type. Copying it may incur a huge overhead.
    /// Alas, per-C++23 we don't have a pleasure of `move_only_function`, 
    /// so 'staying standard' we're stuck with the std::function.
    static_assert(!std::is_trivially_copyable_v<std::decay_t<decltype(foo)>>);
    static_assert(!std::is_trivially_copy_constructible_v<std::decay_t<decltype(foo)>>);

    std::cout << std::endl;
    std::cout << "Test 1: tiny function that is probably stored in SBO" << std::endl;
    std::cout << "Ref count: " << foo.use_count() << std::endl;
    
    std::function<void()> f = [foo] {
        /// Do stuff.  
    };

    std::cout << "Ref count: " << foo.use_count() << std::endl;

    {
        auto g = std::move(f);

        /// Underlying, type-erased data is actually copied not moved
        std::cout << "Ref count: " << foo.use_count() << std::endl;
    }

    std::cout << "Ref count: " << foo.use_count() << std::endl;
}

void test2()
{
    auto foo = std::make_shared<int>(0);

    std::cout << std::endl;
    std::cout << "Test 2: non-tiny function that doesn't fit in SBO" << std::endl;
    std::cout << "Ref count: " << foo.use_count() << std::endl;
    
    std::function<void()> f = [foo, bar = std::array<char, 1024>()] {
        (void)bar;
        /// Do stuff.
    };

    std::cout << "Ref count: " << foo.use_count() << std::endl;

    {
        auto g = std::move(f);

        std::cout << "Ref count: " << foo.use_count() << std::endl;
    }

    std::cout << "Ref count: " << foo.use_count() << std::endl;
}

void test3()
{
    auto foo = std::make_shared<int>(0);

    std::cout << std::endl;
    std::cout << "Test 3: tiny function but using a copy" << std::endl;
    std::cout << "Ref count: " << foo.use_count() << std::endl;
    
    std::function<void()> f = [foo] {
        /// Do stuff.  
    };

    std::cout << "Ref count: " << foo.use_count() << std::endl;

    {
        auto g = f;

        std::cout << "Ref count: " << foo.use_count() << std::endl;
    }

    std::cout << "Ref count: " << foo.use_count() << std::endl;
}

int main()
{
    test1();
    test2();
    test3();
    return 0;
}

Solution

It is a bug in libc++ that cannot be immediately fixed because it would break ABI. Apparently, it is a conforming implementation, although obviously it is often suboptimal.

It's not clear exactly why the Clang devs made such an implementation choice in the first place (although maybe if you're really lucky, someone from Clang will show up and answer this question). It may simply have to do with the fact that Clang's strategy avoids having to have a "vtable" entry for move construction, and thus simplifies the implementation. Also, as I wrote elsewhere, the Clang implementation only uses SOO if the callable is nothrow-copy-constructible in the first place, so it will never use SOO for things that have to allocate from the heap (like a struct that contains a std::vector) so it will never copy such things upon move construction^*. That means the practical effect of the cases where it does copy instead of moving is limited (although it will certainly still cause degraded performance in some cases, such as with std::shared_ptr, where a copy operation must use atomic instructions and a move operation is almost free).

^* OK, there is a caveat here: if you use the allocator-extended move constructor, and the provided allocator is unequal to the one from the source object, you force the libc++ implementation to perform a copy, since, in the case of unequal allocators, it can't just take ownership of the pointer to the out-of-line callable held by the source object. However, you shouldn't use the allocator-extended move constructor anyway; allocator support was removed in C++17 because implementations had various issues with it.