I am trying to plot the theoretical time complexity with measured values for quicksort. The plots for quicksort look good when I input increasing, decreasing or constant data series (they follow N^2 nicely). The problem arises when I enter a random input stream.
For quicksort given random data, the time complexity is n log n, but my data points suggest that it's N^2. I have tried applying different implementations of the partitioning, and I've made sure that the elements get sorted.
Here is my implementation of quicksort and the partition step:
template<typename BidirectionalIterator, typename Compare>
BidirectionalIterator partition_impl(BidirectionalIterator first,
BidirectionalIterator last, Compare cmp) {
auto pivot = std::prev(last, 1);
if (first == pivot) return first;
while (first != last) {
while (cmp(*first, *pivot)) { ++first; } // search greater > pivot
do { --last; } while (!cmp(*last, *pivot) && last >= first); // search < pivot
if (first > last) { // if left iterator has surpassed right iterator
std::iter_swap(first, pivot);
return first;
}
if (*first > *last) {
std::iter_swap(first, last);
++first;
}
}
std::iter_swap(first, pivot);
return first;
}
template <typename FwdIt, typename Comp = std::less<>>
void quick_sort(FwdIt first, FwdIt last, Comp cmp = Comp{}) {
if (std::distance(first, last) <= 1) return;
FwdIt split = partition_impl(first, last, cmp);
quick_sort(first, split, cmp); // sorts the left partition
quick_sort(split + 1, last, cmp); // sorts the right partition
}
So my question: is there anything in my implementation that indicates what the problem may be?
This is what my graph looks like:
EDIT: Some additional code:
void measure(std::vector<int>& ds) {
Timer::start();
algo_ptr(ds.begin(), ds.end(), std::less<>());
Timer::stop();
}
void collectSamples(data_series_t& data_series) {
for (int k = 0; k < NUM_OF_SAMPLES; ++k) {
auto ds = std::get<1>(data_series)(x_measurement_point);
measure(ds);
measurements.push_back(Timer::elapsedTime());
}
}
The algo_ptr goes directly to my implementation above. For every data series, I collect 10 samples, on each iteration I make sure to generate a new random data series such that I don't sort already sorted elements.
Thanks everyone for trying to help. I really appreciate it. I think I have found the solution.
The problem was that I had only allowed 50 different random values to occur. This is way too small of a sample size of random values, since I am measuring much more elements than that.
This essentially caused the pivot to not be randomly chosen, which made my quick sort algorithm deteriorate into N^2.
In hindsight this seems like such an obvious thing I should have realized, but I am still a beginner with DSA.
After fixing this and making sure that I have several million different possible random values, the graph looks like this: