java multithreading concurrency threadpool java.util.concurrent

How to determine coreThreadSize if having multiple threadpools in application

There are a lot of online resources showing how to determine best coreThreadSize when working with a SINGLE threadpool.Brian Goetz in his famous book "Java Concurrency in Practice" recommends the following formula:

 Number of threads = Number of Available Cores * (1 + Wait time / Service time)

However, in real world often cases are we might have introduced multiple threadpools for better arranging different tasks instead of putting all tasks into a single thread pool (even though single thread pool might also work).

Take an example:

we have a machine with 8 cores and only run 1 instance of our application (no container env involved)
we have 4 different time consuming tasks whenever we receive a request, for now just assuming all 4 tasks has a P999 of same duration that its (Wait time / Service time = 1)
we declared 4 thread pools in our application instead of only single thread pool so we'd consider putting each task into an individual thread pool for task management, e.g.
Original Request
- Client A -> Task A -> submit to Threadpool A
- Client B -> Task B -> submit to Threadpool B
- Client C -> Task C -> submit to Threadpool C
- Client D -> Task D -> submit to Threadpool D
- Do all 4 tasks concurrently

Question:

Is there any pro/con when working with a single thread pool VS multiple pools for the same thread tasks? (Pro/Con with different situations are welcome!)
How should we calculate core thread size now if using multiple pools? like by dividing the number of thread pools?

e.g., with a single thread pool we might set thread size as 4 according to the formula (8/(1+1) = 4), with 4 thread pools do we have to only put 1 core thread size for each thread pool (4/4 = 1)? If so, can we not introduce more than 4 thread pools in our application?

Solution

With a single thread pool, this rule of thumb is intended to make the best use of available computing resources. If you consider CPU-bound work (work that spends effectively no time waiting) there is no benefit to having more threads than CPU cores, as no work can be done by threads that are waiting to be scheduled on a core. Having more threads than cores will only waste memory and cause context-switching overhead. As the time each thread spends waiting increases (for I/O or lock acquisition, for example) it becomes more beneficial to have a greater number of threads, as another thread can be scheduled on a core while the currently-scheduled thread waits.

When you have multiple thread pools, sizing them gets more complex, because you also have to consider the distribution of work between the pools.

Consider an example where you have entirely CPU-bound tasks (wait time/service time = 0). If you size each pool according to this formula on your 8 core machine, you'll give each pool a size of 8, with 32 threads in total between the 4 pools. However, only 8 of those will be able to run at a time. If you have 8 concurrent tasks of type A, all of pool A's threads will be running and all of the threads from other pools will be waiting for CPU. If you have 2 of each task type running concurrently, then 2 threads from each pool will be running while 6 threads from each pool will have to wait. Those waiting threads provide no benefit to overall application throughput.

So you might think to use this formula to calculate the total number of threads, and divide them between the pools. With 8 threads and 4 pools, that gives each pool a thread size of 2. This guarantees that all threads will have access to a CPU core when needed, but this is only optimal if the workload is evenly distributed between each pool at all times. In a realistic application, this may be unlikely. If you were to suddenly get 8 requests of type A, only 2 of them would be handled concurrently while the other thread pools wait for incoming work.

If you have a very predictable workload, you can weight the size of each pool by the proportion of requests that will use it, however this may still not use resources optimally at all times. For CPU-bound work, it will generally be more efficient to use a single thread pool, where each thread can handle any type of task.

One possible reason to use separate pools for different task types would be to ensure fairness between different types of tasks, or to ensure that higher priority requests can be handled immediately. Maybe you want to have idle threads ready to take on a task of type B even if pool A is inundated with work. This isn't common—most applications would prefer to operate under a "first in, first out" (FIFO) model—but it does apply in some use cases. Still, there are sometimes more optimal ways to achieve this. If each task can be processed quickly, you could have work scheduled using a priority queue or a round-robin approach to ensure that a single thread pool gives time to different task types in something other than FIFO order. If tasks may take a long (or highly variable) time to complete, having threads ready and waiting for tasks of type B may still provide a responsiveness benefit, as using a single thread pool may mean that all of the threads are busy handling long-running tasks. Implementing a mechanism to interrupt those tasks is possible but more complex than using a separate thread pool to handle the high-priority tasks.

It can sometimes be beneficial to use task-specific thread pools when certain tasks require the use of special resources. Imagine an application with two task types A and B, both CPU-intensive and under high load, but where A always requires a holding a specific mutually-exclusive lock for its duration and B does not. It might make sense to devote one thread/core to task A and 7 to task B. That way, each pool can proceed with its workload without blocking. If you were to use a shared thread pool for both tasks, whenever a thread took a request of type A, it would need to wait to acquire that lock, necessitating a larger number of threads in total in order to maintain high CPU utilisation.

The bottom line is that you have to take application-specific considerations into account when optimising, including what specific constraints your application needs to optimise. As a default, using a single thread pool sized according to a typical workload is a simple approach that often provides a suitable result.