Trying to understand the best possible Java ThreadPoolTaskExecutor that I can define when transferring to OkHttpClient, latency wise. Currently our definition is the following:
<property name="corePoolSize" value="#{ T(java.lang.Math).max(32,numCpu) * 2 }" />
<property name="maxPoolSize" value="#{ T(java.lang.Math).max(32,numCpu) * 8 }" />
<property name="queueCapacity" value="200"/>
That is maximal queue capacity (at which new Thread will be opened) is 200, minimal thread count is max(32,numCpu) * 2 and maximal thread count is max(32,numCpu) * 8. In our case numCpu could vary from 16 to 24 (though if hyper-threading is taken into account then multiply that number *2, right? ). But when you think about it - I am not sure that the number of Threads here should be somehow connected to CPU count. Those are sending/receiving threads of HTTP client, not BusinessLogic threads. So perhaps CPU count shouldn't be even factor here.
Any opinions/advice?
It sounds to me like your thread pool is being used to concurrently make lots of HTTP connections, meaning your performance is limited not by CPU usage but by I/O (and potentially memory too). The "optimal" number of threads is going to be limited by a number of other factors ...
1. Link speed between your client and endpoints.
Let's say your client is connected to a 1Gbps link but somewhere down the line, all of your endpoints can only serve you data at 1Mbps. To max out your local bandwidth, you would need to run 1000 connections concurrently to fully utilize your 1Gbps link, meaning your thread pool needs to run 1000 threads. But this could be problematic as well because of another issue ...
2. Memory usage per-thread is non-zero, even if they aren't doing anything intensive.
The default amount of stack space allocated to a Java varies by vendor, but it's on the order of 1MB. This doesn't sound like a whole lot, but if you need to run thousands of threads to keep as many client connections active at a time, you will need to allocate gigabytes of RAM for the stack space alone. You can adjust the stack space allocated per thread using the -Xss[size]
VM argument, but this is global to the VM, so shrinking the stack size may cause problems in other areas of your program, depending on what you are doing.
3. Average HTTP request size.
Sometimes, it's going to boil down to how much data you expect to transfer per POST/GET call. Recall that each TCP connection requires an initial handshake before any data can be sent. If the amount of data you expect to transmit over the life of an HTTP call is very small, you may not be able to keep thousands of connections running concurrently, even if you have thousands of threads at your disposal. If the amount is very large, it may only take a few concurrent connections to max out the total bandwidth available to your client.
Finally ...
You may not be able to predict the link speed of every connection if all of your endpoints are running out there on the web. I think the best you can do is benchmark the performance of different configurations, while considering each of these factors, and choose the configuration that seems to give the best performance in your typical operating environment. It will likely be somewhere between N and 1000, where N is the number of cores you run, but nailing that number down to something specific will take a little bit of elbow grease :)