Search code examples
javaperformancelsof

Find which thread is causing too many open files issue and why duplicate node ids in lsof output


Our java application is throwing "Too many open files" issue after running for a long duration. Upon debugging the issue, it's seen that there are so many fds open as per lsof output.

# lsof -p pid | grep "pipe" | wc -l

698962

# lsof -p pid | grep "anon_inode" | wc -l

349481

--------------Few lsof data-----------

COMMAND   PID  USER   FD  TYPE             DEVICE SIZE/OFF       NODE NAME
java    23994  app 464u  0000                0,9        0       3042 anon_inode
java    23994  app 465u  0000                0,9        0       3042 anon_inode
java    23994  app 466r  FIFO                0,8      0t0  962495977 pipe
java    23994  app 467w  FIFO                0,8      0t0  962495977 pipe
java    23994  app 468r  FIFO                0,8      0t0  963589016 pipe
java    23994  app 469w  FIFO                0,8      0t0  963589016 pipe
java    23994  app 470u  0000                0,9        0       3042 anon_inode
java    23994  app 471u  0000                0,9        0       3042 anon_inode

How to find to the root reason for many open FDs of type FIFO and 0000. There are not many file read/writes happening in our application. There so many TCP messages read from stream using apache mina framework which internally uses Nio.

These are my questions

  1. We have checked /proc/pid/task/ folder. There are many folders. Does it corresponds to thread ids ? But as per jstack, there are 141 threads where as this folder has 209 subfolders.
  2. How to find which thread is causing fd leak ? In our case most of the folder in task folder corresponds to many fds. ie. /proc/pid/task/threadid/fd folder has many fd records
  3. What are the possible reasons for pipe and anon_inodes in lsof
  4. What is meaning of FD type 0000
  5. All anon_inode is with same node id 3042. What is the meaning of this?

Solution

  • The most likely thing is that you are opening resources and then not properly closing them. Make sure you use appropriate methods such as try-with-resources or try-finally blocks to tidy up.

    To find the problem you should route all your IO through a class and then keep track of open and close, possibly even remembering the stack trace. You can then query that and see where you are leaking resources.