Search code examples
javamultithreadingcpumulticore

Java multi-threading, best size for a pool depending on CPU cores (virtual with multi-threading and physical)


I am playing around with multi-treading in Java (Sun JDK 1.7 64 bit), trying to grasp some concepts better. What I find puzzling is finding the size of a thread pool for the executor and impact of that setting on performance. Here is my basic code:

public class Program {

static int bestThreads = 0;
static long bestTime = Integer.MAX_VALUE;

public static void main(String[] args) throws InterruptedException, ExecutionException {

    int cores = Runtime.getRuntime().availableProcessors();

    for (int sizeOfPool = 1; sizeOfPool <= cores; sizeOfPool++) {
        ExecutorService exec = Executors.newFixedThreadPool(sizeOfPool);

        //System.out.println("Started");

        int noOftasks = 1000;
        for (int i = 0; i < noOftasks; i++) {
            Calculator c = new Calculator();
            exec.submit(c);
        }
        long start = System.currentTimeMillis();

        exec.shutdown();
        exec.awaitTermination(1000, TimeUnit.DAYS);

        long stop = (System.currentTimeMillis() - start);

        //System.out.println("Done " + noOftasks + " tasks in " + stop + " on " + sizeOfPool + " threads");

        if (bestTime > stop) {
            bestTime = stop;
            bestThreads = sizeOfPool;
        }

    }

    System.out.println("Best size of pool " + bestThreads + " result in " + bestTime + " ms");

}

public static class Calculator implements Runnable {

    @Override
    public void run() {
        doJob();
    }

}

//Can be whatever this just gives me a few milliseconds worth of CPU load since I don't want to use Thread.sleep()
public static void doJob() {
    for (int j = 0; j < 1E3; j++) {
        Math.round(Math.sin(Math.sqrt(Math.random())));

    }
}

When I run this program I get that the setting that used the smallest amount of time was the one using N threads where N is usually 2 (Meaning I should use 2 threads for the size of my thread pool). I don't get why this happens because number of processors I get from .availableProcessors() is 4 (I am using i3 with multi-threading, it's on a laptop and Windows shows that all threads are active when running the program). Also I usually get different results when I change the amount of work done with:

1E1 -> N=4

1E2 -> N=3 or 2

1E3 -> N=2

1E4 -> N=2

but even then in most cases I get N=2;

Can somebody please explain why do I get results like this and what is usually suggested pool size depending on CPU that the program is ran on.

Here is a little bit more output that I find strange:

Done 1000 tasks in 195 on 1 threads //Ok it takes around 200ms for this processor to do this, over-clocking it will help here I imagine

Done 1000 tasks in 134 on 2 threads //I know I can't get 2x increase because of context switching and some other impacts of thread creation overhead but this is a nice speedup

Done 1000 tasks in 138 on 3 threads //Almost same as 2 threads, why is it not worse or better

Done 1000 tasks in 210 on 4 threads //Worse then 1 thread, this is the one I really don't get


Solution

  • Your "test" job is completely CPU bound, that means it depends only on the CPU/core speed. While the i3 claims to have 4 cores, its a dual core CPU (2 cores with 2 threads each - aka hyperthreading).

    Hyperthreading does not give you 4 full cores, each core works on either of its two threads (it switches automatically, e.g. when a thread waits for a memory access). So in your test case the i3 CPU performs best with two threads, since that is the maximum your CPU can handle (truely) simultaneously.

    With a different test (e.g. with lots of memory accesses, or waiting for I/O) you would get different "ideal" thread numbers.

    Edit: There is no way I know of to distinguish between a real "physical" core and a "virtual" core in java. Newer AMD CPU's have their own quirks in that regard (separate cores, but FPU shared between 2 cores), so its really very low-level technology dependent. To really get all the details you probably would need to read the CPU-Id and check the data-sheet for that CPU.

    The reason you get sometime 2 and sometimes 3 is probably due to multithreaded tests not being truely deterministic (the operation system will inevitably eat some CPU at random times). Also, short duration test show often a lot of variation in java due to the JIT warming up (look for microbenchmarking, its a complicated topic).

    You should see a difference between i3 / i7 regardless.