Having an intent to study a sort algorithm (of my own), I decided to compare its performance with the classical quicksort
and to my great surprise I've discovered that the time taken by my implementation of quicksort
is far not proportional to N log(N)
. I thoroughly tried to find an error in my quicksort
but unsuccessfully. It is a simple version of the sort algorithm working with arrays of Integer
of different sizes, filled with random numbers, and I have no idea, where the error can sneak in. I have even counted all the comparisons and swaps executed by my code, and their number was rather fairly proportional to N log(N)
. I am completely confused and can't understand the reality I observe. Here are the benchkmark results for sorting arrays of 1,000, 2,000, 4,000, 8,000 and 16,000 random values (measured with JMH
):
Benchmark Mode Cnt Score Error Units 2N / N ratio
QSortBenchmarks.sortArray01000 avgt 5 561.505 ± 2.992 us/op
QSortBenchmarks.sortArray02000 avgt 5 2433.307 ± 11.770 us/op 4.334
QSortBenchmarks.sortArray04000 avgt 5 8510.448 ± 34.051 us/op 3.497
QSortBenchmarks.sortArray08000 avgt 5 38269.492 ± 161.010 us/op 4.497
QSortBenchmarks.sortArray16000 avgt 5 147132.524 ± 261.963 us/op 3.845
Obviously the time complexity I observe is very far from O(n log(n))
, it's almost O(n^2)
. There may be a tiniest suspicion that the random seed has befallen so unlucky that the values in the arrays happened to be close to the worst case. The probability of this is very close to 0, yet not 0. But the results with a few different random seeds were very similar to that.
And here are the numbers of comparisons and swaps (for 40 iterations with randomly-filled arrays of each size):
Avr. Comparisons Avr. Swaps 2N / N ratio
sortArray01000(): 12119.925, 2398.925
sortArray02000(): 26581.600, 5268.525 2.193, 2.196
sortArray04000(): 59866.925, 11451.625 2.252, 2.174
sortArray08000(): 127731.025, 25006.425 2.134, 2.184
sortArray16000(): 273409.925, 54481.525 2.141, 2.179
As everyone can see, the numbers of operations conform the O(Nlog(N))
law rather well.
It is even possible to suspect that the cost of an individual operation depends on the size of the array we deal with, and I've checked if it's true (with a simple methods that reverse arrays of different sizes) -- and no, as expected, it's not so. The times were very close to O(n)
.
The only thing I can think of is there's some tricky logical error in my code, but I can't figure it out.
Can anybody help me?
Here's the code:
import java.io.IOException;
import java.util.Locale;
import java.util.Random;
import java.util.function.Consumer;
import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;
/**
* Why does quicksort take time disproportionate to N * log(N)?
* Rectified for StackOverflow
* 21.05.17 16:20:01
*/
@State(value = Scope.Benchmark)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(java.util.concurrent.TimeUnit.MICROSECONDS)
@Fork(value = 2)
@Warmup(iterations = 5, time = 10)
@Measurement(iterations = 5, time = 10)
public class QSortBenchmarks {
private static final int LOOPS_TO_ITERATE = 40; // 40;
private static final int RAND_SEED = 123456789;
private static final int RAND_LIMIT = 10000;
private static final Random RAND = new Random(RAND_SEED); // Constant seed for reproducibility;
private static int cmpCount = 0, swapCount = 0;
private static int cmpTotal = 0, swapTotal = 0;
private static final Integer[] array01000 = new Integer[1000];
private static final Integer[] array02000 = new Integer[2000];
private static final Integer[] array04000 = new Integer[4000];
private static final Integer[] array08000 = new Integer[8000];
private static final Integer[] array16000 = new Integer[16000];
@Setup
public static void initData() {
cmpCount = 0; swapCount = 0;
fillWithRandoms(array01000);
fillWithRandoms(array02000);
fillWithRandoms(array04000);
fillWithRandoms(array08000);
fillWithRandoms(array16000);
}
public static void main(String[] args) throws IOException, RunnerException {
Locale.setDefault(Locale.US);
initData();
runJMH(); // Run benchmarks. Comment-out it, if you want just to count comparisons etc.
// System.exit(0); // If don't want to count comparisons and swaps
System.out.printf("\nRand seed = %d, rand limit = %d, iterations = %d\n",
RAND_SEED, RAND_LIMIT, LOOPS_TO_ITERATE);
System.out.print("sortArray01000(): ");
loopOverMethod(qq -> sortArray01000());
System.out.print("sortArray02000(): ");
loopOverMethod(qq -> sortArray02000());
System.out.print("sortArray04000(): ");
loopOverMethod(qq -> sortArray04000());
System.out.print("sortArray08000(): ");
loopOverMethod(qq -> sortArray08000());
System.out.print("sortArray16000(): ");
loopOverMethod(qq -> sortArray16000());
}
private static void loopOverMethod(Consumer<Object> method) {
cmpTotal = 0; swapTotal = 0;
for (int loops = 0; loops < LOOPS_TO_ITERATE; loops++ ) {
initData();
method.accept(null);;
cmpTotal += cmpCount; swapTotal += swapCount;
}
System.out.printf("avrg compares: %12.3f, swaps: %12.3f\n",
(double)cmpTotal / LOOPS_TO_ITERATE, (double)swapTotal / LOOPS_TO_ITERATE);
}
/**
* @throws RunnerException
*/
private static void runJMH() throws RunnerException {
final Options opt = new OptionsBuilder()
.include(QSortBenchmarks.class.getSimpleName())
.forks(1)
.build();
new Runner(opt).run();
}
private static void fillWithRandoms(Integer[] array) {
for (int i = 0; i < array.length; i++) { // // Fill it with LIST_SIZE random values
array[i] = (int)(RAND.nextDouble() * RAND_LIMIT);
}
}
@Benchmark
public static void sortArray01000() {
final Integer[] array = array01000;
quickSort(array, 0, array.length - 1);
}
@Benchmark
public static void sortArray02000() {
final Integer[] array = array02000;
quickSort(array, 0, array.length - 1);
}
@Benchmark
public static void sortArray04000() {
final Integer[] array = array04000;
quickSort(array, 0, array.length - 1);
}
@Benchmark
public static void sortArray08000() {
final Integer[] array = array08000;
quickSort(array, 0, array.length - 1);
}
@Benchmark
public static void sortArray16000() {
final Integer[] array = array16000;
quickSort(array, 0, array.length - 1);
}
private static void quickSort(Integer[] array, int lo, int hi) {
if (hi <= lo) return;
final int j = partition(array, lo, hi);
quickSort(array, lo, j - 1);
quickSort(array, j + 1, hi);
}
private static int partition(Integer[] array, int lo, int hi) {
int i = lo, j = hi + 1;
while (true) {
while (compare(array[++i], array[lo]) < 0) // while (array[++i] < array[lo])
if (i == hi) break;
while (compare(array[lo], array[--j]) < 0) // while (array[lo] < array[--j])
if (j == lo) break;
if (i >= j) break;
swapItems(array, i, j);
}
swapItems(array, lo, j);
return j;
}
private static int compare(Integer v1, Integer v2) {
cmpCount++;
return v1.compareTo(v2);
}
private static void swapItems(Integer[] array, int i, int j) {
swapCount++;
final Integer tmp = array[i];
array[i] = array[j];
array[j] = tmp;
}
}
The situation has cleared up. It was JMH
that showed such strange results that had nothing in common with the actual execution times. A simple hand-made benchmarking with System.nanoTime()
, for arrays of 10,000, 20,000, 40,000, 80,000, and 160,000 items, showed the following times, :
Rand seed = 123, rand limit = 1000000, iterations = 1000
sortArray_010k(): avrg time: 1049.538 mks/op
sortArray_020k(): avrg time: 2282.189 mks/op
sortArray_040k(): avrg time: 4976.079 mks/op
sortArray_080k(): avrg time: 10762.461 mks/op
sortArray_160k(): avrg time: 23115.345 mks/op
while the JMH
results for the same arrays are as following:
Benchmark Mode Cnt Score Error Units
QSortBenchmarks.sortArray_010k avgt 5 109.454 ± 0.444 ms/op
QSortBenchmarks.sortArray_020k avgt 5 373.518 ± 17.439 ms/op
QSortBenchmarks.sortArray_040k avgt 5 1350.420 ± 26.733 ms/op
QSortBenchmarks.sortArray_080k avgt 5 6519.015 ± 48.770 ms/op
QSortBenchmarks.sortArray_160k avgt 5 26837.697 ± 926.132 ms/op
The figures shown by the hand-made benchmark conform the N log(N)
quite well, look realistic, and are in good agreement with the observed execution time in seconds. And they are from 100 to more than 1000 times less than the figures shown by JMH
.
Here's the modified loopOverMethod()
with which they are obtained:
private static void loopOverMethod(Consumer<Object> method) {
for (int loops = 0; loops < 100; loops++ ) { // Kinda warmup
initData();
method.accept(null);
}
long time = 0;
cmpTotal = 0; swapTotal = 0;
for (int loops = 0; loops < LOOPS_TO_ITERATE; loops++ ) {
initData();
time -= System.nanoTime();
method.accept(null);
time += System.nanoTime();
cmpTotal += cmpCount; swapTotal += swapCount;
}
System.out.printf("avrg time: \t%10.3f mks\n",
time * 1e-3 / LOOPS_TO_ITERATE);
}
But now another question arises. Why does JMH
shows such strange results? What is wrong in the way I use it? And obviously I have to revise other project where I use JMH
. Rather an unpleasant news.
Put it here as an update for the question, since comments don't allow to inser quotes. The comprehensive answer given by Thomas Kleger is absolutely correct. After following his suggestions, I got the following results:
JMH:
Benchmark Mode Cnt Score Error Units
QSortBenchmarks.sortArray_010k avgt 5 975.028 ± 23.907 us/op
QSortBenchmarks.sortArray_020k avgt 5 2253.627 ± 94.108 us/op
QSortBenchmarks.sortArray_040k avgt 5 4836.680 ± 80.964 us/op
QSortBenchmarks.sortArray_080k avgt 5 10041.063 ± 27.796 us/op
QSortBenchmarks.sortArray_160k avgt 5 21232.223 ± 32.008 us/op
Hand-made benchmarks:
Rand seed = 123, rand limit = 1000000, iterations = 1000
sortArray_010k(): avrg time: 1060.951 mks
sortArray_020k(): avrg time: 2296.533 mks
sortArray_040k(): avrg time: 5021.629 mks
sortArray_080k(): avrg time: 10855.963 mks
sortArray_160k(): avrg time: 23335.923 mks
Now the results look quite plausible and I'm completely satisfied with them.
Three points work together against your implementation:
In the very early versions of quicksort, the leftmost element of the partition would often be chosen as the pivot element. Unfortunately, this causes worst-case behavior on already sorted arrays
To fix this, you could change your benchmark methods. For example, you could change sortArray01000()
to
@Benchmark
public static void sortArray01000() {
final Integer[] array = Arrays.copyOf(array01000, array01000.length);
quickSort(array, 0, array.length - 1);
}
Or you could modify the @Setup
annotation so that it is executed before each invocation of the benchmark method:
@Setup(Level.Invocation)
public static void initData() {
//...
}
The @Setup
annotation takes a parameter that determines when the setup method is executed.
The three levels are (https://hg.openjdk.java.net/code-tools/jmh/file/2be2df7dbaf8/jmh-core/src/main/java/org/openjdk/jmh/annotations/Level.java):
Level.Trial
: before each benchmarkLevel.Iteration
: before each iterationLevel.Invocation
: before each execution of the benchmark methodThe default level is Level.Trial
(https://hg.openjdk.java.net/code-tools/jmh/file/2be2df7dbaf8/jmh-core/src/main/java/org/openjdk/jmh/annotations/Setup.java#l54).
What does that mean for your test?
To understand this you have to understand how JMH executes your benchmark:
Now with the @Setup(Level.Trial)
and a benchmark method that sorts the input data in place that means that only the very first call of the quicksort method can show the O(N log(N))
behavior, all remaining calls operate on the already sorted array and show the worst case behavior of O(N^2)
.
With @Setup(Level.Iteration)
the situation is still not much better - now it is the first call of the benchmark method in each iteration that has O(N log(N))
behavior, the remaining ~20'000 calls per iteration still show O(N^2)
.
With @Setup(Level.Invocation)
finally each invocation of the benchmark method (and therefore each invocation of the quicksort) gets its own, unsorted array as an input, and this clearly shows in the results:
@Setup(Level.Trial)
1000: 780 us
2000: 3300 us
@Setup(Level.Iteration)
1000: 780 us
2000: 3280 us
4000: 11700 us
@Setup(Level.Invocation)
1000: 58 us
2000: 124 us
4000: 280 us
With my proposed change (copying the input array in the benchmark method) I get slightly better results, but these could be due to caching effects:
1000: 25 us
2000: 108 us
4000: 260 us