Search code examples
javacsortingoptimizationbenchmarking

Is there any reason why Java would be performing faster than C for sorting int arrays with manual Insertion / Selection / Radix sorts?


Platform: OpenBSD, compiler: gcc, javac (OpenJDK version 17)

I have made a few sorting benchmarks in various languages, and I'm rather surprised by the performance of the Java program over the C program.

I have programmed the exact same sorting algorithms in both languages, and the Java program finishes almost twice as fast, all other languages are slower than the C implementation except the Java one.

The benchmarks involve running the sorting algorithm on a random array of numbers a set number of times.

I am compiling the program with -O3 and -Ofast, so I cannot apply any more compiler optimizations.

The exact code can be found here, but here is an excerpt from it:

Java:

public static void benchmark(SortingFunction func, int arraySize, int numTimes, String name, BufferedOutputStream bo) throws IOException {
    int[][] arrs = new int[numTimes][arraySize];
    for (int i = 0; i < numTimes; i ++) {
      arrs[i] = genRandArray(arraySize);
    }
    long start = System.nanoTime();
    for (int i = 0; i < numTimes; i ++) {
      func.sort(arrs[i]);
    }
    long end = System.nanoTime();
    double time = (double)(end - start) / 1e9;
    System.out.println("It took " + time + " seconds to do " + name + " sort " +
            numTimes + " times on arrays of size " + arraySize
    );
    String out = name+","+numTimes+","+arraySize+","+time;
    for (char c : out.toCharArray()) {
      bo.write(c);
    }
    bo.write('\n');
  }
public static void insertionSort(int[] array) {
    for (int i = 1; i < array.length; i ++) {
      int temp = array[i];
      int j;
      for (j = i - 1; j >= 0 && array[j] > temp; j --) {
        array[j+1] = array[j];
      }
      array[j+1] = temp;
    }
  }

C:

void benchmark(void (*f)(int *, int), int arr_size, int num_times, char *name,
               FILE *fp) {
  int *arrs[num_times];
  struct timeval start, end;
  double t;
  for (int i = 0; i < num_times; i++) {
    arrs[i] = gen_rand_arr(arr_size);
  }
  gettimeofday(&start, NULL);
  for (int i = 0; i < num_times; i++) {
    f(arrs[i], arr_size);
  }
  gettimeofday(&end, NULL);
  for (int i = 0; i < num_times; i++) {
    free(arrs[i]);
  }
  t = ((double)(end.tv_sec * 1000000 + end.tv_usec -
                (start.tv_sec * 1000000 + start.tv_usec))) /
      1000000;
  printf("It took %f seconds to do %s sort %d times on arrays of size %d\n", t,
         name, num_times, arr_size);

  if (fp != NULL) {
    fprintf(fp, "%s,%d,%d,%f\n", name, num_times, arr_size, t);
  }
}
void insertion_sort(int *arr, int arr_size) {
  for (int i = 1; i < arr_size; i++) {
    int temp = arr[i];
    int j;
    for (j = i - 1; j >= 0 && *(arr + j) > temp; j--) {
      arr[j + 1] = arr[j];
    }
    arr[j + 1] = temp;
  }
  return;
}

Are there some optimizations that Java is making to run faster that somehow change the algorithm? What is going on here?

Any explanations would be appreciated.

Here is a table of results that might help explain the difference:

Java:

name rep size time
Insertion 10000 1200 1.033
Insertion 10000 5000 15.677
Insertion 10000 12000 88.190
Selection 10000 1200 3.118
Selection 10000 5000 48.377
Selection 10000 12000 268.608
Radix 10000 1200 0.385
Radix 10000 5000 1.491
Radix 10000 12000 3.563
Bogo 1 11 1.330
Bogo 1 12 0.572
Bogo 1 13 122.777

C:

name rep size time
Insertion 10000 1200 1.766
Insertion 10000 5000 26.106
Insertion 10000 12000 140.582
Selection 10000 1200 4.011
Selection 10000 5000 60.442
Selection 10000 12000 340.608
Radix 10000 1200 0.430
Radix 10000 5000 1.788
Radix 10000 12000 4.295
Bogo 1 11 1.378
Bogo 1 12 2.296
Bogo 1 13 1586.73

Edit:

I modified the benchmarking function to generate the arrays beforehand, in Java it overflows the heap, and in C it makes it not much faster (around 25%, but Java is still faster).

fwiw I also changed how I indexed things in C from *(arr + i) to arr[i].

Here's the implementation for the random array generator in Java and C:

Java:

  public static int[] genRandArray(int arraySize) {
    int[] ret = new int[arraySize];
    Random rand = new Random();
    for (int i = 0; i < ret.length; i ++) {
      ret[i] = rand.nextInt(100);
    }
    return ret;
  }

C:

int *gen_rand_arr(int arr_size) {
  int *arr;
  if ((arr = malloc(arr_size * sizeof(int))) == NULL) {
    exit(1);
  }
  for (int i = 0; i < arr_size; i++) {
    arr[i] = arc4random_uniform(100);
  }
  return arr;
}

Solution

  • TL;DR

    In general, short snippets like this are not a fair way to compare languages. There are a lot of factors that comes into play. Code does not automatically get faster when you write it in C instead of Java. If that were the case, you could just write a Java2C converter. Compiler flags matters a lot, but also the skill of the programmer.

    Longer explanation

    I cannot say for sure, but this:

    for (j = i - 1; j >= 0 && arr[j] > temp; j--) {
      arr[j + 1] = arr[j];
    }
    

    is not very cache friendly, because you're traversing the list backwards. I would try changing the loop so that the outer loop do the backwards traversing instead of the inner loop.

    But I'd say that your question is fundamentally flawed. Code does not automatically get a performance boost just because you rewrite it from Java to C. In the same way, C programs does not automatically get faster because you rewrite them to assembly. One could say that C allows you to write faster programs than Java, but in the end, the result depend on the programmer.

    One thing that can speed up Java programs is the JIT compiler, which can do statistics to optimize the code during runtime for the specific conditions right there and then. While it is possible to make a C compiler to optimize for typical workload, it cannot optimize for current workload.

    You said that you used -O3 for the C code. But what target did you use? Did you optimize for your machine or a general one? The JIT compiler knows the target to optimize for. Try using -march=native

    Are you sure that you're using the same size for int? It's 32-bit in Java, but might be 64-bit in C. It might speed up the C code if you switch to int32_t instead. But it might also slow it down. (Very unlikely that this is the cause, but I just wanted to mention it as a possibility)

    C usually shines when it comes to very low level stuff.

    And if we look in your Java code:

    for (int i = 1; i < array.length; i ++) {
          int temp = array[i];
    

    In this example, a smart compiler can easily see that array will never be accessed out of bounds. But what if we instead would have something like:

    while(<condition>) {
          int temp = array[foo()];
    

    where it cannot be determined beforehand that array will not go out of bounds? Then Java is forced to do constant boundary checking to be able to throw exceptions. The code would be translated to something like:

    while(<condition>) {
          int i = foo();
          if(i >= array.length)
               throw exception;
          int temp = array[i];
    

    This gives security, but costs performance. C would simply allow you to access out of bounds, which is faster but may cause bugs that are hard to find.

    I found a nice question with more info: Why would it ever be possible for Java to be faster than C++?

    Apart from that, I can see that you're including the data generation in the benchmark. That's very bad. Generate the data before starting the timer. Like this:

    int *arrs[num_times];
    
    for (int i = 0; i < num_times; i++) 
        arrs[i] = gen_rand_arr(arr_size);
    
    gettimeofday(&start, NULL);
    
    for (int i = 0; i < num_times; i++) 
        f(arrs[i], arr_size);
    
    gettimeofday(&end, NULL);