Search code examples
javamultithreadingtcpreal-timenio

What are the current lowest TCP latencies one can accomplish in Java?


We are currently using a multithreaded solution for a high performance TCP server handling 20 simultaneous connections. Our average latencies run around 200 microseconds per message and we have been struggling to tame the GC activity which can produce 1ms+ outliers. Low latency is our utmost goal for this server and we are aware that our current numbers are bad. We are evaluating single threaded approaches so we can have all these 20 connections handled by a single thread.

What is the current floor for TCP latency in Java, in other words, how fast can two machines exchange messages in Java through a TCP socket over a 10Gb network?


Solution

  • A multithreaded server is not the way to go for latency and 200 micros is indeed too high. For ultra-low-latency network applications it is mandatory to use a single-threaded, asynchronous, non-blocking network library. You can easily handle these 20 socket connections inside the same reactor thread (i.e. network selector) which can be pinned to a dedicated and isolated cpu core. Moreover, if using Java, it is mandatory to use a network library that leaves zero garbage behind since a cleaning GC activity will most likely block the critical reactor thread introducing the bad outliers you are observing.

    To give you an idea of TCP latencies you can take a look on these benchmarks using CoralReactor, which is an ultra-low-latency an garbage-free network library implemented in Java.

    Messages: 1,000,000 (size 256 bytes)
    Avg Time: 2.15 micros
    Min Time: 1.976 micros
    Max Time: 64.432 micros
    Garbage created: zero   
    75% = [avg: 2.12 micros, max: 2.17 micros]
    90% = [avg: 2.131 micros, max: 2.204 micros]
    99% = [avg: 2.142 micros, max: 2.679 micros]
    99.9% = [avg: 2.147 micros, max: 3.022 micros]
    99.99% = [avg: 2.149 micros, max: 5.604 micros]
    99.999% = [avg: 2.149 micros, max: 7.072 micros]
    

    Keep in mind that 2.15 micros is over loopback, so I am not considering network and os/kernel latencies. For a 10Gb network, the over-the-wire latency for a 256-byte size message will be at least 382 nanoseconds from NIC to NIC. If you are using a network card that supports kernel-bypass (i.e. SolarFlare's OpenOnLoad) then the os/kernel latency should be very low.

    Disclaimer: I am one of the developers of CoralReactor.