Search code examples
javamultithreadingconcurrencycpu-usagemulticore

Why Is Java Not Utilising All My CPU Cores Effectively


I am running Ubuntu on a machine with a quad core cpu. I have written some test Java code that spawns a given number of processes that simply increment a volatile variable for a given number of iterations when run.

I would expect the running time to not increase significantly while the number of threads are less than or equal to the number of cores i.e. 4. In fact, these are the times I get using "real time" from the UNIX time command:

1 thread: 1.005s

2 threads: 1.018s

3 threads: 1.528s

4 threads: 1.982s

5 threads: 2.479s

6 threads: 2.934s

7 threads: 3.356s

8 threads: 3.793s

This shows that adding one extra thread does not increase the time as expected, but then the time does increase with 3 and 4 threads.

At first I thought this could be because the OS was preventing the JVM from using all the cores, but I ran top, and it clearly showed that with 3 threads, 3 cores were running at ~100%, and with 4 threads, 4 cores were maxed out.

My question is: why is the code running on 3/4 CPUs not roughly the same speed as when it runs on 1/2? Because it is running parallel on all the cores.

Here is my main method for reference:

class Example implements Runnable {

    // using this so the compiler does not optimise the computation away
    volatile int temp;

    void delay(int arg) {
        for (int i = 0; i < arg; i++) {
            for (int j = 0; j < 1000000; j++) {
                this.temp += i + j;
            }
        }
    }

    int arg;
    int result;

    Example(int arg) {
        this.arg = arg;
    }

    public void run() {
        delay(arg);
        result = 42;
    }

    public static void main(String args[]) {

        // Get the number of threads (the command line arg)

        int numThreads = 1;
        if (args.length > 0) {
            try {
                numThreads = Integer.parseInt(args[0]);
            } catch (NumberFormatException nfe) {
                System.out.println("First arg must be the number of threads!");
            }
        }

        // Start up the threads

        Thread[] threadList = new Thread[numThreads];
        Example[] exampleList = new Example[numThreads];
        for (int i = 0; i < numThreads; i++) {
            exampleList[i] = new Example(1000);
            threadList[i] = new Thread(exampleList[i]);
            threadList[i].start();
        }

        // wait for the threads to finish

        for (int i = 0; i < numThreads; i++) {
           try {
                threadList[i].join();
                System.out.println("Joined with thread, ret=" + exampleList[i].result);
            } catch (InterruptedException ie) {
                System.out.println("Caught " + ie);
            }
        }
    }
}

Solution

  • The Core i5 in a Lenovo X1 Carbon is not a quad core processor. It's a two core processor with hyperthreading. When you're performing only trivial operations that do not result in frequent, long pipeline stalls, then the hyperthreading scheduler won't have much opportunity to weave other operations into the stalled pipeline and you won't see performance equivalent to four actual cores.