java c sorting optimization benchmarking

Is there any reason why Java would be performing faster than C for sorting int arrays with manual Insertion / Selection / Radix sorts?

Platform: OpenBSD, compiler: gcc, javac (OpenJDK version 17)

I have made a few sorting benchmarks in various languages, and I'm rather surprised by the performance of the Java program over the C program.

I have programmed the exact same sorting algorithms in both languages, and the Java program finishes almost twice as fast, all other languages are slower than the C implementation except the Java one.

The benchmarks involve running the sorting algorithm on a random array of numbers a set number of times.

I am compiling the program with -O3 and -Ofast, so I cannot apply any more compiler optimizations.

The exact code can be found here, but here is an excerpt from it:

Java:

public static void benchmark(SortingFunction func, int arraySize, int numTimes, String name, BufferedOutputStream bo) throws IOException {
    int[][] arrs = new int[numTimes][arraySize];
    for (int i = 0; i < numTimes; i ++) {
      arrs[i] = genRandArray(arraySize);
    }
    long start = System.nanoTime();
    for (int i = 0; i < numTimes; i ++) {
      func.sort(arrs[i]);
    }
    long end = System.nanoTime();
    double time = (double)(end - start) / 1e9;
    System.out.println("It took " + time + " seconds to do " + name + " sort " +
            numTimes + " times on arrays of size " + arraySize
    );
    String out = name+","+numTimes+","+arraySize+","+time;
    for (char c : out.toCharArray()) {
      bo.write(c);
    }
    bo.write('\n');
  }

public static void insertionSort(int[] array) {
    for (int i = 1; i < array.length; i ++) {
      int temp = array[i];
      int j;
      for (j = i - 1; j >= 0 && array[j] > temp; j --) {
        array[j+1] = array[j];
      }
      array[j+1] = temp;
    }
  }

void benchmark(void (*f)(int *, int), int arr_size, int num_times, char *name,
               FILE *fp) {
  int *arrs[num_times];
  struct timeval start, end;
  double t;
  for (int i = 0; i < num_times; i++) {
    arrs[i] = gen_rand_arr(arr_size);
  }
  gettimeofday(&start, NULL);
  for (int i = 0; i < num_times; i++) {
    f(arrs[i], arr_size);
  }
  gettimeofday(&end, NULL);
  for (int i = 0; i < num_times; i++) {
    free(arrs[i]);
  }
  t = ((double)(end.tv_sec * 1000000 + end.tv_usec -
                (start.tv_sec * 1000000 + start.tv_usec))) /
      1000000;
  printf("It took %f seconds to do %s sort %d times on arrays of size %d\n", t,
         name, num_times, arr_size);

  if (fp != NULL) {
    fprintf(fp, "%s,%d,%d,%f\n", name, num_times, arr_size, t);
  }
}

void insertion_sort(int *arr, int arr_size) {
  for (int i = 1; i < arr_size; i++) {
    int temp = arr[i];
    int j;
    for (j = i - 1; j >= 0 && *(arr + j) > temp; j--) {
      arr[j + 1] = arr[j];
    }
    arr[j + 1] = temp;
  }
  return;
}

Are there some optimizations that Java is making to run faster that somehow change the algorithm? What is going on here?

Any explanations would be appreciated.

Here is a table of results that might help explain the difference:

Java:

name	rep	size	time
Insertion	10000	1200	1.033
Insertion	10000	5000	15.677
Insertion	10000	12000	88.190
Selection	10000	1200	3.118
Selection	10000	5000	48.377
Selection	10000	12000	268.608
Radix	10000	1200	0.385
Radix	10000	5000	1.491
Radix	10000	12000	3.563
Bogo	1	11	1.330
Bogo	1	12	0.572
Bogo	1	13	122.777

name	rep	size	time
Insertion	10000	1200	1.766
Insertion	10000	5000	26.106
Insertion	10000	12000	140.582
Selection	10000	1200	4.011
Selection	10000	5000	60.442
Selection	10000	12000	340.608
Radix	10000	1200	0.430
Radix	10000	5000	1.788
Radix	10000	12000	4.295
Bogo	1	11	1.378
Bogo	1	12	2.296
Bogo	1	13	1586.73

Edit:

I modified the benchmarking function to generate the arrays beforehand, in Java it overflows the heap, and in C it makes it not much faster (around 25%, but Java is still faster).

fwiw I also changed how I indexed things in C from *(arr + i) to arr[i].

Here's the implementation for the random array generator in Java and C:

Java:

  public static int[] genRandArray(int arraySize) {
    int[] ret = new int[arraySize];
    Random rand = new Random();
    for (int i = 0; i < ret.length; i ++) {
      ret[i] = rand.nextInt(100);
    }
    return ret;
  }

int *gen_rand_arr(int arr_size) {
  int *arr;
  if ((arr = malloc(arr_size * sizeof(int))) == NULL) {
    exit(1);
  }
  for (int i = 0; i < arr_size; i++) {
    arr[i] = arc4random_uniform(100);
  }
  return arr;
}

Solution

TL;DR

In general, short snippets like this are not a fair way to compare languages. There are a lot of factors that comes into play. Code does not automatically get faster when you write it in C instead of Java. If that were the case, you could just write a Java2C converter. Compiler flags matters a lot, but also the skill of the programmer.

Longer explanation

I cannot say for sure, but this:

for (j = i - 1; j >= 0 && arr[j] > temp; j--) {
  arr[j + 1] = arr[j];
}

is not very cache friendly, because you're traversing the list backwards. I would try changing the loop so that the outer loop do the backwards traversing instead of the inner loop.

But I'd say that your question is fundamentally flawed. Code does not automatically get a performance boost just because you rewrite it from Java to C. In the same way, C programs does not automatically get faster because you rewrite them to assembly. One could say that C allows you to write faster programs than Java, but in the end, the result depend on the programmer.

One thing that can speed up Java programs is the JIT compiler, which can do statistics to optimize the code during runtime for the specific conditions right there and then. While it is possible to make a C compiler to optimize for typical workload, it cannot optimize for current workload.

You said that you used -O3 for the C code. But what target did you use? Did you optimize for your machine or a general one? The JIT compiler knows the target to optimize for. Try using -march=native

Are you sure that you're using the same size for int? It's 32-bit in Java, but might be 64-bit in C. It might speed up the C code if you switch to int32_t instead. But it might also slow it down. (Very unlikely that this is the cause, but I just wanted to mention it as a possibility)

C usually shines when it comes to very low level stuff.

And if we look in your Java code:

for (int i = 1; i < array.length; i ++) {
      int temp = array[i];

In this example, a smart compiler can easily see that array will never be accessed out of bounds. But what if we instead would have something like:

while(<condition>) {
      int temp = array[foo()];

where it cannot be determined beforehand that array will not go out of bounds? Then Java is forced to do constant boundary checking to be able to throw exceptions. The code would be translated to something like:

while(<condition>) {
      int i = foo();
      if(i >= array.length)
           throw exception;
      int temp = array[i];

This gives security, but costs performance. C would simply allow you to access out of bounds, which is faster but may cause bugs that are hard to find.

I found a nice question with more info: Why would it ever be possible for Java to be faster than C++?

Apart from that, I can see that you're including the data generation in the benchmark. That's very bad. Generate the data before starting the timer. Like this:

int *arrs[num_times];

for (int i = 0; i < num_times; i++) 
    arrs[i] = gen_rand_arr(arr_size);

gettimeofday(&start, NULL);

for (int i = 0; i < num_times; i++) 
    f(arrs[i], arr_size);

gettimeofday(&end, NULL);