Search code examples
javanio

sun.rmi.transport.tcp.TCPTransport uses 100% CPU


I'm developping a communication library based on NIO's non-blocking SocketChannels so I can use select to keep my thread low on CPU usage (and getting faster reaction time to other events). SocketChannel are created externally to my thread and added to the list it handles, marking them as non-blocking and adding them to a Selector for READ operations (and WRITE when needed, but that does not happen in my problem).

I have a little Swing application for tests, running locally, that can be either a client or server: the client one connects to the server one and they can send each other messages. Pretty simple and works fine, excepts for the CPU which tops 100% (50% for each jvm) as soon as the connection is established between client and server.

Running jvisualvm shows me that sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run() uses 98% of the application time, counting only 3 method calls!
A forced stack trace shows it's blocking on the read operation on a FilteredInputStream, on a Socket.

I'm a little puzzled as I don't use RMI (though I can understand NIO and RMI can share the "transport" code part). I have seen a few similar questions but each were specifically using RMI, which I'm not. The answers I've seen is that this ConnectionHandler.run() method is responsible for marshalling/unmarshalling things, when here I get 100% CPU without any network traffic. I can only infer an active wait on the sockets but that sounds odd, especially with non-blocking SocketChannel...

Any idea would be greatly appreciated!


Solution

  • I tracked CPU use down to select(int timeout) which returns 0 immediately, regardless of the timeout value. My understanding of this function was it would block until a selected operation pops up, or timeout is reached (as said in the Javadoc).

    However, I found out this other StackOverflow post showing the same problem: OP_CONNECT operation has to be cancelled once connection is accepted.

    Many thanks to @Alexander, and @EJP for clarification about the OP_WRITE/OP_CONNECT similarities.

    Regarding tge RMI part, it was probably due to Eclipse run configurations.