I'm reading C++ Concurrency In Action
and in page 32 (Chapter 2) there is this code.
template <typename Iterator, typename T>
struct accumulate_block
{
void operator()(Iterator first, Iterator last, T &result)
{
result = std::accumulate(first, last, result);
}
};
template <typename Iterator, typename T>
T parallel_accumulate(Iterator first, Iterator last, T init)
{
unsigned long const length = std::distance(first, last);
if (!length)
return init;
unsigned long const min_per_thread = 25;
unsigned long const max_threads = (length + min_per_thread - 1) / min_per_thread;
unsigned long const hardware_threads = std::thread::hardware_concurrency();
unsigned long const num_threads = std::min(hardware_threads != 0 ? hardware_threads : 2, max_threads);
unsigned long const block_size = length / num_threads;
std::vector<T> results(num_threads);
std::vector<std::thread> threads(num_threads - 1);
Iterator block_start = first;
for (unsigned long i = 0; i < (num_threads - 1); ++i)
{
Iterator block_end = block_start;
std::advance(block_end, block_size);
threads[i] = std::thread(accumulate_block<Iterator, T>(),
block_end, std::ref(results[i]));
block_start = block_end;
}
accumulate_block<Iterator,T(block_start,last,results[num_threads-1]);
for(auto& entry: threads)
entry.join();
return std::accumulate(results.begin(),results.end(),init);
}
I don't understand some things.
min_per_thread
? Is this just an arbitrary number or is there a thought behing it?unsigned long const max_threads=
(length+min_per_thread-1)/min_per_thread;
Why do we use this formula to find the "max threads" and what do we need it for?
I tried to search for other similar questions around this piece of code but found nothing.
Why did the author choose 25 as the min_per_thread? Is this just an arbitrary number or is there a thought behing it?
The number 25
is an arbitrary number. You may use any number as you like.
I don't understand the formula in this piece of code:
unsigned long const max_threads=
(length+min_per_thread-1)/min_per_thread;
When you divide a number, for example 13
(length
) by 4
(min_per_thread
) you will get 3
due to the integer arithmetic. The remainder is equal to 1, so you provided only 3
threads for the three ranges [0, 3]
, [4,7]
and [8, 11]
, but the element 12
is left without a thread. You need to provide a thread for the remainder.
You can write:
13 + ( 4 - 1 )
that is equal to 16
, and 16
divided by 4
will yield 4
. Now all ranges included in the range for the remainder will have their own thread.
If, for example, length
is equal to 12
then the added value min_per_thread - 1
will not affect the required number of threads, equal to 3
in this case.
In general, the remainder can be any value in the range [0, min_per_thread - 1]
.
Particulary, if length
is less than min_per_thread
then to use the expression without the operand min_per_thread - 1
you will get that max_threads
will be equal to 0
, though one thread is required in any case.