Search code examples
javasortingmergesort

Mergesort running faster on larger inputs


I'm working on an empirical analysis of merge sort (sorting strings) for school, and I've run into a strange phenomenon that I can't explain or find an explanation of. When I run my code, I capture the running time using the built in system.nanotime() method, and for some reason at a certain input size, it actually takes less time to execute the sort routine than with a smaller input size.

My algorithm is just a basic merge sort, and my test code is simple too:

//Get current system time
long start = System.nanoTime();
//Perform mergesort procedure
a = q.sort(a);
//Calculate total elapsed sort time
long time = System.nanoTime()-start;

The output I got for elapsed time when sorting 900 strings was: 3928492ns For 1300 strings it was: 3541923ns

With both of those being the average of about 20 trials, so it's pretty consistent. After 1300 strings, the execution time continues to grow as expected. I'm thinking there might be some peak input size where this phenomenon is most noticeable.

So my Question: What might be causing this sudden increase in speed of the program? I was thinking there might be some sort of optimization going on with arrays holding larger amounts of data, although 1300 items in an array is hardly large.

Some info:

  • Compiler: Java version 1.7.0_07
  • Algorithm: Basic recursive merge sort (using arrays)
  • Input type: Strings 6-10 characters long, shuffled (random order)

Am I missing anything?


Solution

  • Am I missing anything?

    You're trying to do a microbenchmark, but the code you've posted so far does not resemble a well working sample. To do so, please follow the rules stated here: How do I write a correct micro-benchmark in Java?.

    The explanation about your code being faster is because after some iterations of your method, the JIT will trigger and the performance of your code will be optimized, thus your code getting faster, even when processing larger data.

    Some recommendations:

    • Use several array/list inputs with different size. Good values to do this kind of analysis are 100, 1000 (1k), 10000 (10k), 100000 (100k), 1000000 (1m) and random size values between these. You will get more accurate results when performing evaluations that take longer time.
    • Use arrays/lists of different objects. Create a POJO and make it implement the Comparable interface, then execute your sort method. As explained above, use different arrays values.

    Not directly related to your question, but the execution results are based on the JDK used. Eclipse is just an IDE and can work with different JDK versions, e.g. at my workplace I use JDK 6 u30 to work on projects on the company, but for personal projects (like proof of concepts) I use JDK 7 u40.