Search code examples
scalaplayframeworkproduction

Play framework scala - too many open files - how to in production


In the following, I am using Play framework 2.4.0 with Scala 2.11.7.

I am stressing a simple play app with Gatling, injecting 5000 users over 60 seconds and in a few seconds, the play server returns the following:

"Failed to accept a connection." and "java.io.IOException: Too many open files in system".

Here is the associated stacktrace:

22:52:48.943 [application-akka.actor.default-dispatcher-12] INFO  play.api.Play$ - Application started (Dev)
22:53:08.939 [New I/O server boss #17] WARN  o.j.n.c.s.nio.AbstractNioSelector - Failed to accept a connection.
java.io.IOException: Too many open files in system
        at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) ~[na:1.8.0_45]
        at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422) ~[na:1.8.0_45]
        at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250) ~[na:1.8.0_45]
        at org.jboss.netty.channel.socket.nio.NioServerBoss.process(NioServerBoss.java:100) [netty-3.10.3.Final.jar:na]
        at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) [netty-3.10.3.Final.jar:na]
        at org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42) [netty-3.10.3.Final.jar:na]
        at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) [netty-3.10.3.Final.jar:na]
        at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) [netty-3.10.3.Final.jar:na]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_45]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_45]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]

I suppose this is due to the ulimit of the system (would it be possible to confirm that?), and if so, my question is the following:

How this kind of error is managed in production environment? Is this by setting a high value with ulimit -n <high_value>?


Solution

  • Most foolproof way to check is to:

    cat /proc/PID/limits

    You will see:

    Limit                     Soft Limit           Hard Limit           Units
    Max cpu time              unlimited            unlimited            seconds
    ...
    Max open files            1024                 1024                 files
    ...
    

    To see your current shell, you can always:

    ulimit -a

    And get

    ...
    open files                      (-n) 1024
    ...
    

    Changing it is best done system wide via /etc/security/limits.conf But you can use ulimit -n to change for only the current shell.

    In terms of how to deal with this situation, there's simply no alternative to having enough file descriptors. Set it high and if it hits, there is a leak or you work for facebook. In production, I believe the general recommendation is 16k.