I've been looking for a way to implement Timsort for C++ (Implementation found on Github) with multithreading and I've tried using in this process. I'm sure I'm using the correct compiler flags, but whenever I try to use Timsort as I do below:
#pragma omp parallel shared(DataVector)
{
gfx::timsort(DataVector.begin(), DataVector.end(), comp_1);
}
Note: Data being sorted is a vector containing strings of individual words, and I'm using a my own comparator.
It seems to sort in the same amount of time that it takes to run without using OpenMP. Using the appropriate includes for chrono and such, I timed values that were within .01 seconds of each other on average, hovering around 1.24 seconds for my sort.
Is there a reason the threading doesn't seem to work with my sorting method, or is it a problem with the way I'm implementing OpenMP?
Note on purpose: I have been using __gnu_parallel::sort as well with better results but I'm looking to compare these methods in practice myself.
omp parallel
needs to see the loop it is going to parallelize. The way you've declared it, omp will parallelize a single section of code which does not give any benefit.
Check your docs on omp parallel
usage.
To do a for loop you need to use omp parallel for
with the for statement following. The way you have it now it will run your timsort on every core you have.