Java I/O vs. Java new I/O (NIO) with Linux NPTL

My webservers use the usual Java I/O with thread per connection mechanism. Nowadays, they are getting on their knees with increased user (long polling connection). However, the connections are mostly idle. While this can be solved by adding more webservers, I have been trying to do some research on the NIO implementation.

I got a mixed impression about it. I have read about benchmarks where regular I/O with the new NPTL library in Linux outperforms NIO.

What is the real life experience of configuring and using the latest NPTL for Linux with Java I/O? Is there any increased performance?

And on a larger scope question:

What is the maximum number of I/O and blocking threads (that we configure in the Tomcat thread pool) in a standard server class machine (Dell with a quad-core processor) we expect to perform normally (with Linux NPTL library?). What's the impact if the threadpool gets really big, say more than 1000 threads?

Any references and pointers will be very much appreciated.

Solution

Provocative blog posting, "Avoid NIO, get better throughput." Paul Tyma's(2008) blog claims ~5000 threads without any trouble; I've heard folks claim more:

With NPTL on, Sun and Blackwidow JVM 1.4.2 scaled easily to 5000+ threads. Blocking model was consistently 25-35% faster than using NIO selectors. Lot of techniques suggested by EmberIO folks were employed - using multiple selectors, doing multiple (2) reads if the first read returned EAGAIN equivalent in Java. Yet we couldn't beat the plain thread per connection model with Linux NPTL.

I think the key here is to measure the overhead and performance, and make the move to non-blocking I/O only when you know you need to and can demonstrate an improvement. The additional effort to write and maintain non-blocking code should be factored in to your decision. My take is, if your application can be cleanly expressed using synchronous/blocking I/O, DO THAT. If your application is amenable to non-blocking I/O and you won't just be re-inventing blocking I/O badly in application-space, CONSIDER moving to nio based on measured performance needs. I'm amazed when I poke around the google results for this how few of the resources actually cite any (recent) numbers!

Also, see Paul Tyma's presentation slides: The old way is new again. Based on his work at Google, concrete numbers suggest that synchronous threaded I/O is quite scalable on Linux, and consider "NIO is faster" a myth that was true for awhile, but no longer. Some good additional commentary here on Comet Daily. He cites the following (anecdotal, still no solid link to benchmarks, etc...) result on NPTL:

In tests, NPTL succeeded in starting 100,000 threads on a IA-32 in two seconds. In comparison, this test under a kernel without NPTL would have taken around 15 minutes

If you really are running into scalability problems, you may want to tune the thread stack size using XX:ThreadStackSize. Since you mention Tomcat see here.

Finally, if you're bound and determined to use non-blocking I/O, make every effort to build on an existing framework by people who know what they're doing. I've wasted far too much of my own time trying to get an intricate non-blocking I/O solution right (for the wrong reasons).