Search code examples
javasocketsnioioexceptionsocketchannel

How do I handle ServerSocketChannel.accept() IOException: too many open files in NIO?


I'm having a problem with one of my servers, on Friday morning I got the following IOException:

11/Sep/2015 01:51:39,524 [ERROR] [Thread-1] - ServerRunnable: IOException: 
java.io.IOException: Too many open files
    at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) ~[?:1.7.0_75]
    at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:241) ~[?:1.7.0_75]
    at com.watersprint.deviceapi.server.ServerRunnable.acceptConnection(ServerRunnable.java:162) [rsrc:./:?]
    at com.watersprint.deviceapi.server.ServerRunnable.run(ServerRunnable.java:121) [rsrc:./:?]
    at java.lang.Thread.run(Thread.java:745) [?:1.7.0_75]

Row 162 of the ServerRunnable class is in the method below, it's the ssc.accept() call.

private void acceptConnection(Selector selector, SelectionKey key) {

    try {
        ServerSocketChannel ssc = (ServerSocketChannel) key.channel();
        SocketChannel sc = ssc.accept();
        socketConnectionCount++;

        /*
         * Test to force device error, for debugging purposes
         */
        if (brokenSocket
                && (socketConnectionCount % brokenSocketNumber == 0)) {

            sc.close();

        } else {

            sc.configureBlocking(false);
            log.debug("*************************************************");
            log.debug("Selector Thread: Client accepted from "
                    + sc.getRemoteAddress());

            SelectionKey newKey = sc.register(selector,
                    SelectionKey.OP_READ);
            ClientStateMachine clientState = new ClientStateMachine();
            clientState.setIpAddress(sc.getRemoteAddress().toString());
            clientState.attachSelector(selector);
            clientState.attachSocketChannel(sc);
            newKey.attach(clientState);

        }

    } catch (ClosedChannelException e) {

        log.error("ClosedChannelException: ", e);
        ClientStateMachine clientState = (ClientStateMachine)key.attachment();
        database.insertFailedCommunication(clientState.getDeviceId(),
                clientState.getIpAddress(),
                clientState.getReceivedString(), e.toString());
        key.cancel();

    } catch (IOException e) {
        log.error("IOException: ", e);
        
    }

}

How should I handle this? reading up on the error it appears to be a setting in the Linux OS that limits the number of open files a process can have. Judging from that, and this question here, it appears that I am not closing sockets correctly (The server is currently serving around 50 clients). Is this a situation where I need a timer to monitor open sockets and time them out after an extended period?

I have some cases where a client can connect and then not send any data once the connection is established. I thought I had handled those cases properly.

It's my understanding that a non-blocking NIO server has very long timeouts, is it possible that if I've missed cases like this they might accumulate and result in this error?

This server has been running for three months without any issues. After I go through my code and check for badly handled / missing cases, what's the best way to handle this particular error? Are there other things I should consider that might contribute to this?

Also, (Maybe this should be another question) I have log4j2 configured to send emails for log levels of error and higher, yet I didn't get an email for this error. Are there any reasons why that might be? It usually works, the error was logged to the log file as expected, but I never got an email about it. I should have gotten plenty as the error occurred every time a connection was established.


Solution

  • You fix your socket leaks. When you get EOS, or any IOException other than SocketTimeoutException, on a socket you must close it. In the case of SocketChannels, that means closing the channel. Merely cancelling the key, or ignoring the issue and hoping it will go away, isn't sufficient. The connection has already gone away.

    The fact that you find it necessary to count broken socket connections, and catch ClosedChannelException, already indicates major logic problems in your application. You shouldn't need this. And cancelling the key of a closed channel doesn't provide any kind of a solution.

    It's my understanding that a non-blocking NIO server has very long timeouts

    The only timeout a non-blocking NIO server has is the timeout you specify to select(). All the timeouts built-in to the TCP stack are unaffected by whether you are using NIO or non-blocking mode.