Issue with trying to create a thread

I am attempting to make a function that takes in an std::function and a vector of values and will return a vector which contains the functions output for these values, and makes use of threading in order to speed it up.

When I compile my code it is saying that it is unable to specialize the function template for std::invoke and that it expected 1 argument and got 7

Here is my code:

#include <vector>
#include <thread>
#include <functional>
#include <iterator>

template<
    typename RETURN,
    typename INPUT
>
void thread_instance(std::function<RETURN(INPUT)> function, 
                     const std::vector<INPUT>& input, 
                     typename std::vector<INPUT>::iterator input_start, 
                     typename std::vector<INPUT>::iterator input_end,
                     std::vector<RETURN>& output,
                     typename std::vector<RETURN>::iterator output_start)
{
    for (; input_start != input_end; ++input_start, ++output_start)
    {
        *output_start = function(*input_start);
    }
}

template<
    typename RETURN,
    typename INPUT
>
std::vector<RETURN> thread_map(std::function<RETURN(INPUT)> function, std::vector<INPUT> input, int thread_count)
{
    std::vector<std::thread> threads(thread_count);
    std::vector<RETURN> output(input.size());
    for (int i = 0; i < thread_count; ++i)
    {
        int start_index = (input.size() / thread_count) * i;
        int end_index = start_index + input.size() / thread_count;
        typename std::vector<INPUT>::iterator thread_input_start = input.begin() + start_index;
        typename std::vector<INPUT>::iterator thread_input_end = input.begin() + end_index;
        typename std::vector<RETURN>::iterator thread_output_start = output.begin() + start_index;
        threads[i] = std::thread(thread_instance<RETURN, INPUT>, function, input, thread_input_start, thread_input_end, output, thread_output_start);
    }
    for (int i = 0; i < thread_count; ++i)
    {
        threads[i].join();
    }
    return output;
}

int multiply_by_2(int num)
{
    return num * 2;
}

int main(int argc, char** argv)
{
    std::vector<int> nums_to_sum = { 4,3,67,5,32,6,3,2,4 };
    std::vector<int> summed_nums = thread_map(std::function<int(int)>(multiply_by_2), nums_to_sum, 12);
}

Solution

According to cppreference for std::thread:

The arguments to the thread function are moved or copied by value. If a reference argument needs to be passed to the thread function, it has to be wrapped (e.g., with std::ref or std::cref).

Think what would happen if you started a thread with a reference and the object was destroyed after the thread started? A dangling reference and undefined behaviour! So the default behaviour of accepting parameters by value makes sense.

In your case, you're not even using the output vector. An iterator to the output vector will suffice, and it can be passed by value. So I would recommend changing your function such that it doesn't accept the output vector by reference:

template<
    typename RETURN,
    typename INPUT
>
void thread_instance(std::function<RETURN(INPUT)> function,
    const std::vector<INPUT>& input,
    typename std::vector<INPUT>::iterator input_start,
    typename std::vector<INPUT>::iterator input_end,
    typename std::vector<RETURN>::iterator output_start)
{
    for (; input_start != input_end; ++input_start, ++output_start)
    {
        *output_start = function(*input_start);
    }
}

Another thing is that if you're calculating the start iterator using integer division like this:

int start_index = (input.size() / thread_count) * i;

and thread_count is greater than input.size() the result will always be zero. Instead of 12 a number such as 3, which is a divisor of 9, would be better. Also bear in mind that creating threads is relatively expensive and you don't want to create too many. std::thread::hardware_concurrency() will return the number of concurrent threads supported by your system.