I've implemented a working multithreaded merge sort in C++, but I've hit a wall.
In my implementation, I recursively split an input vector into two parts, and then thread these two parts:
void MergeSort(vector<int> *in)
{
if(in->size() < 2)
return;
vector<int>::iterator ite = in->begin();
vector<int> left = vector<int> (ite, ite + in->size()/2);
vector<int> right = vector<int> (ite + in->size()/2, in->end() );
//current thread spawns 2 threads HERE
thread t1 = thread(MergeSort, &left);
thread t2 = thread(MergeSort, &right);
t1.join();
t2.join();
vector<int> ret;
ret.reserve(in->size() );
ret = MergeSortMerge(left, right);
in->clear();
in->insert(in->begin(), ret.begin(), ret.end() );
return;
}
The code appears to be pretty, but it's one of the most vicious codes I've ever written. Trying to sort an array of more than 1000 int values causes so many threads to spawn, that I run out of stack space, and my computer BSODs :( Consistently.
I am well aware of the reason why this code spawns so many threads, which isn't so good, but technically (if not theoretically), is this not a proper implementation?
Based on a bit of Googling, I seem to have found the need for a threadpool. Would the use of a threadpool resolve the fundamental issue I am running into, the fact that I am trying to spawn too many threads? If so, do you have any recommendations on libraries?
Thank you for the advice and help!
As zdan explained, you shall limit the number of threads. There are two things to consider to determine what's the limit,
The number of CPU cores. In C++11, you can use std::thread::hardware_concurrency()
to determine the hardware cores. However, this function may return 0 meaning that the program doesn't know how many cores, in this case, you may assume this value to be 2 or 4.
Limited by the number of data to be processed. You can divide the data to be processed by threads until 1 data per thread, but it will cost too much for only 1 data and it's not cost efficient. For example, you can probably say, when the number of data is smaller than 50, you don't want to divide anymore. So you can determine the maximum number of threads required based on something like total_data_number / 50 + 1
.
Then, you choose a minimum number between case 1 & case 2 to determine the limit.
In your case, because you are generating thread by recursion, you can try to determine the recursion depth in similar ways.