Search code examples
javamemoryioapache-tailer

Gigantic virtual memory in java when tailing files


I've made a program that does like "tail -f" on a number of log files on a machine, using Apache Tailer from commons IO. Basically it runs in a thread, opens the file as a RandomAccessFile, checks its length, seeks to the end etc. It sends all log lines collected to a client.

The somewhat uncomfortable thing about it, is that on Linux it can show an enormous amount of VIRT memory. Right now it says 16.1g VIRT (!!) and 203m RES.

I have read up a little on virtual memory and understood that it's often "nothing to worry about".. But still, 16 GB? Is it really healthy?

When I look at the process with pmap, none of the log file names are shown so I guess they are not memory mapped.. And I read (man pmap) that "[ anon ]" in the "Mapping" column of pmap output means "allocated memory". Now what does that mean? :)

However, pmap -x shows:

Address           Kbytes     RSS   Dirty Mode   Mapping
...
----------------  ------  ------  ------
total kB        16928328  208824  197096

..so I suppose it's not residing in RAM, after all.. But how does it work memory-wise when opening a file like this, seeking to the end of it, etc?

Should I worry about all those GB of VIRT memory ? It "watches" 84 different log files right now, and the total size of these on disk are 31414239 bytes.

EDIT: I just tested it on another, less "production-like", Linux machine and did not get the same numbers. VIRT got to ~2,5 GB at most there. I found that some of the default JVM settings were different (checked with "java -XX:+PrintFlagsFinal -version"):

Value              Small machine    Big machine
InitialHeapSize    62690688         2114573120
MaxHeapSize        1004535808       32038191104
ParallelGCThreads  2                13

..So, uhm.. I guess it grabs more heap on the big machine, since the max limit is (way) higher? And I also guess it's a good idea to always specify those values explicitly..


Solution

  • A couple of things:

    • Each Tailer instance will have its own thread. And each thread has a stack. By default (on a 64bit JVM) thread stacks are 1Mb each, so you will be using 84Mb for stacks. You might consider reducing that using the -Xss option at launch time.

    • A large virt size is not necessarily bad. But if it translates into a demand on physical memory ... and you don't have that much ... then that really is bad.


    Hmm I actually run it without any JVM args at all right now. Is that good or bad? :-)

    I understand now. Yes it is bad. The JVM's default heap sizing on a large 64bit machine are far more than you really need.

    Assuming that your application is only doing simple processing of the log lines, I recommend that you set the max heap size to a relatively small size (e.g. 64Mb). That way if you do get a leak it won't impact on the rest of your system by gobbling up lots of real memory.