Search code examples
javalinuxlimitnettylsof

Too much open files Exception under "unlimited" system


I am seing a lot of too many open files exceptions in the execution of my program. Typically those occur in the following form:

org.jboss.netty.channel.ChannelException: Failed to create a selector.

...
Caused by: java.io.IOException: Too many open files

However, those are not the only exceptions. I have observed similar ones (caused by "too many open files") but those are much less frequent.

Strangely enough i have set the limit of open files of the screen session (from where i launch my programs) as 1M:

root@s11:~/fabiim-cbench# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 20
file size               (blocks, -f) unlimited
pending signals                 (-i) 16382
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
**open files                      (-n) 1000000**
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) unlimited
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Moreover, as observed by the output of lsof -p I see no more that 1111 open files (sockets, pipes, files) before the exceptions are thrown.

Question: What is wrong and/or how can i dig deeper into this problem.

Extra: I am currently integrating Floodlight with bft-smart. In a nutshell the floodlight process is the one crashing with too much open files exceptions when executing a stress test launched by a benchmark program. This benchmark program will maintain 64 tcp connections to the floodlight process which in turn should maintain at least 64 * 3 tcp connections to the bft-smart replicas. Both programs use netty to manage these connections.


Solution

  • First thing to check—can you run ulimit from inside your Java process to make sure that the file limit is the same inside? Code like this should work:

    InputStream is = Runtime.getRuntime().exec(new String[] {"bash", "-c", "ulimit -a"}).getInputStream();
    int c;
    while ((c = is.read()) != -1) {
        System.out.write(c);
    }
    

    If the limit still shows 1 million, well, you’re up for some hard debugging.

    Here are a couple of things that I would look into if I had to debug this—

    1. Are you running out of tcp port numbers? What does netstat -an show when you hit this error?

    2. Use strace to find out exactly what system call with what parameters is causing this error to be thrown. EMFILE is a return value of 24.

    3. The “Too many open files” EMFILE error can actually be thrown by a number of different system calls for a number of different reasons:

      $ cd /usr/share/man/man2
      $ zgrep -A 2 EMFILE *
      accept.2.gz:.B EMFILE
      accept.2.gz:The per-process limit of open file descriptors has been reached.
      accept.2.gz:.TP
      accept.2.gz:--
      accept.2.gz:.\" EAGAIN, EBADF, ECONNABORTED, EINTR, EINVAL, EMFILE,
      accept.2.gz:.\" ENFILE, ENOBUFS, ENOMEM, ENOTSOCK, EOPNOTSUPP, EPROTO, EWOULDBLOCK.
      accept.2.gz:.\" In addition, SUSv2 documents EFAULT and ENOSR.
      dup.2.gz:.B EMFILE
      dup.2.gz:The process already has the maximum number of file
      dup.2.gz:descriptors open and tried to open a new one.
      epoll_create.2.gz:.B EMFILE
      epoll_create.2.gz:The per-user limit on the number of epoll instances imposed by
      epoll_create.2.gz:.I /proc/sys/fs/epoll/max_user_instances
      eventfd.2.gz:.B EMFILE
      eventfd.2.gz:The per-process limit on open file descriptors has been reached.
      eventfd.2.gz:.TP
      execve.2.gz:.B EMFILE
      execve.2.gz:The process has the maximum number of files open.
      execve.2.gz:.TP
      execve.2.gz:--
      execve.2.gz:.\" document ETXTBSY, EPERM, EFAULT, ELOOP, EIO, ENFILE, EMFILE, EINVAL,
      execve.2.gz:.\" EISDIR or ELIBBAD error conditions.
      execve.2.gz:.SH NOTES
      fcntl.2.gz:.B EMFILE
      fcntl.2.gz:For
      fcntl.2.gz:.BR F_DUPFD ,
      getrlimit.2.gz:.BR EMFILE .
      getrlimit.2.gz:(Historically, this limit was named
      getrlimit.2.gz:.B RLIMIT_OFILE
      inotify_init.2.gz:.B EMFILE
      inotify_init.2.gz:The user limit on the total number of inotify instances has been reached.
      inotify_init.2.gz:.TP
      mmap.2.gz:.\" SUSv2 documents additional error codes EMFILE and EOVERFLOW.
      mmap.2.gz:.SH AVAILABILITY
      mmap.2.gz:On POSIX systems on which
      mount.2.gz:.B EMFILE
      mount.2.gz:(In case no block device is required:)
      mount.2.gz:Table of dummy devices is full.
      open.2.gz:.B EMFILE
      open.2.gz:The process already has the maximum number of files open.
      open.2.gz:.TP
      pipe.2.gz:.B EMFILE
      pipe.2.gz:Too many file descriptors are in use by the process.
      pipe.2.gz:.TP
      shmop.2.gz:.\" SVr4 documents an additional error condition EMFILE.
      shmop.2.gz:
      shmop.2.gz:In SVID 3 (or perhaps earlier)
      signalfd.2.gz:.B EMFILE
      signalfd.2.gz:The per-process limit of open file descriptors has been reached.
      signalfd.2.gz:.TP
      socket.2.gz:.B EMFILE
      socket.2.gz:Process file table overflow.
      socket.2.gz:.TP
      socketpair.2.gz:.B EMFILE
      socketpair.2.gz:Too many descriptors are in use by this process.
      socketpair.2.gz:.TP
      spu_create.2.gz:.B EMFILE
      spu_create.2.gz:The process has reached its maximum open files limit.
      spu_create.2.gz:.TP
      timerfd_create.2.gz:.B EMFILE
      timerfd_create.2.gz:The per-process limit of open file descriptors has been reached.
      timerfd_create.2.gz:.TP
      truncate.2.gz:.\" error conditions EMFILE, EMULTIHP, ENFILE, ENOLINK.  SVr4 documents for
      truncate.2.gz:.\" .BR ftruncate ()
      truncate.2.gz:.\" an additional EAGAIN error condition.
      

      If you check out all these manpages by hand, you may find something interesting. For example, I think it’s interesting that epoll_create, the underlying system call that is used by NIO channels, will return EMFILE “Too many open files” if

      The per-user limit on the number of epoll instances imposed by /proc/sys/fs/epoll/max_user_instances was encountered. See epoll(7) for further details.

      Now that filename doesn’t actually exist on my system, but there are some limits defined in files in /proc/sys/fs/epoll and /proc/sys/fs/inotify that you might be hitting, especially if you’re running multiple instances of the same test on the same machine. Figuring out if that’s the case is a chore in itself—you could start by checking syslog for any messages…

    Good luck!