Search code examples
javaalgorithmconventionsselection-sortbranch-prediction

Use branch prediction with no else statement


I am currently implementing selectionsort. Using the code below and the driver file to test it below. I am currently trying to do micro optimizations to see what speeds it up.

  public static void selectionsort(int[] data) {
    for (int i = 0; i < data.length; i++) {
      int minx = i;  
      for (int j = i+1; j < data.length; j++) {
          if(data[j] < data[minx]){
            minx = j;
          }
          // else{
          //   minx=minx;
          // }
        }
      int swap = data[minx]; 
      data[minx] = data[i]; 
      data[i] = swap; 
    }
  }
import java.util.Arrays;
import java.util.Random;

public class Driver {
    public static void main(String[] args) {
        Random r = new Random(12);
        // CODE-BLOCK-A
        // run selection sort version of tests
        double avg = 0;
        for (int i = 0; i < 100; i++) {
            long time = System.currentTimeMillis();
        int[] sgarr = randIntArr(Integer.parseInt(args[0]),r);
        int[] tests = Arrays.copyOf(sgarr, sgarr.length);
        switch (args[1]) {
            case "selection":
                Sorts.selectionsort(sgarr);
                break;
            case "bubble":
                Sorts.bubblesort(sgarr);
            default:
                Sorts.insertionsort(sgarr);
                break;
        }
        Arrays.sort(tests);
        if (!equals(tests, sgarr)) {
            System.out.println("FAIL TO SORT!");
            System.exit(0);
        }
        // System.out.println("SUCCESS!");
    

        long etime = System.currentTimeMillis();
        avg += (etime-time);
        }

    System.out.println("Took "+ avg / (100.0*1000.0) +" seconds");

    }

    public static int[] randIntArr(int count,Random r) {
        int seed = r.nextInt(12433);
        r.setSeed(seed);
        int[] arr = new int[count];
        for (int i = 0; i < count; i++) {
            arr[i] = (int) (r.nextDouble() * 100000);
        }
        return arr;
    }

    public static boolean equals(int[] d1, int[] d2) {
        if (d1.length != d2.length) {
            return false;
        }
        for (int i = 0; i < d2.length; i++) {
            if (d1[i] != d2[i]) {
                return false;
            }
        }

        return true;
    }
}

What I have tried doing to speed it up is using an else block that should do nothing. I expected this to slow it down as now it is checking an if statement every time like normal but also assigning minx to itself which should slow it down. But if I put an empty else statement the improvements go away.

However when I run it using this command, when the else block is commented out I get:

javac Driver.java 
&& java Driver 40000 selection
Took 0.34416 seconds
However, when I uncomment out the else block and run the same command it gets noticeably faster
Took 0.30086 seconds

Meaning for the best case scenarios on both it is about a 13.43% difference.

I am assuming that this change is significant enough to not just be random considering I also ran each one multiple times. I then asked my CS teacher about it and he explained it is probably branch prediction. Which does make sense however what I wonder is if I get the same performance enhancements without an essentially useless else statement or if I can't what is considered better practice to keep or to remove it. Or is 13.43% significant, I don't believe it is due to my computer because I ran it many times and the values were within 3% of each time. This being ran on x86, if that matters.


Tested currently using this main file

package org.example;



import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;

import java.util.concurrent.TimeUnit;

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@State(Scope.Benchmark)
public class Main {

    @Param({"100", "1000", "10000"}) // You can add more values for array sizes
    private int arraySize;

    private int[] randomArray;

    @Setup(Level.Iteration)
    public void setup() {
        // Initialize the array with random data
        randomArray = createRandomArray(arraySize);
    }

    @Benchmark
    public void selectionSortBenchmark() {
        Sorts.selectionsort(randomArray.clone());
    }

    private int[] createRandomArray(int size) {
        int[] array = new int[size];
        for (int i = 0; i < size; i++) {
            array[i] = (int) (Math.random() * 1000); // Adjust the range as needed
        }
        return array;
    }

    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
                .include(Main.class.getSimpleName())
                .forks(1)
                .warmupIterations(5)
                .measurementIterations(5)
                .build();

        new Runner(opt).run();
    }
}

Solution

  • The reason as @PeterCordes said in the comments was a testing issue when using correct testing software like JMH all the data points are very similar and definitely within my computer have more or less resources available or some other randomness-ish. Here are the results when using JMH:

    Commented out else block
    Main.selectionSortBenchmark          100  avgt    5   0.003 ± 0.001  ms/op
    Main.selectionSortBenchmark         1000  avgt    5   0.266 ± 0.050  ms/op
    Main.selectionSortBenchmark        10000  avgt    5  19.087 ± 0.702  ms/op
    Not commented out else block
    Main.selectionSortBenchmark          100  avgt    5   0.003 ±  0.001  ms/op
    Main.selectionSortBenchmark         1000  avgt    5   0.234 ±  0.010  ms/op
    Main.selectionSortBenchmark        10000  avgt    5  18.468 ±  3.204  ms/op
    Empty else block
    Main.selectionSortBenchmark          100  avgt    5   0.003 ±  0.001  ms/op
    Main.selectionSortBenchmark         1000  avgt    5   0.273 ±  0.005  ms/op
    Main.selectionSortBenchmark        10000  avgt    5  20.022 ±  2.740  ms/op