java scala file-io hprof space-efficiency

HPROF result interpretation

I have two sets of HProf dumps one for large sample and other for smaller sample - both are result from a very small sample of the huge data that I have. I'm trying to figure out the bottleneck in my approach.

Here are my Heap allocation data for Large (http://pastebin.com/PEH8yR3v) and small sample (http://pastebin.com/aR8ywkDH).

I notice that char[] is the one that takes most of my memory.Also the % of memory takaen by char[] varies from small to large sample run. I don't know how it will vary when I profile my whole sample.

But, the important question I'm concerned is - with this program( READ, PARSE/PROCESS ,WRITE) when I try to run for a input data of size of 3GB which writes back 10GB of data. Except for a list whose size is not more than 1GB, I'dont store anything in memory - this is plain read, process, write pipeline. Given this, my program still takes around 7GB of main memory while running.

This is my approach,

read a file in from a string Iterator
for each line in ip_file perform 
  op_buffer = myFunction(line)
write op_buffer to op_file.
Perform this for all 20K files in my input data. 

def myFunction(line)
{
 var :String = null;
 for each word in line  
  {
   var class_obj = new Classname(word)
   op_line + = class_obj.result
  }
return op_line
}

Since, the objects created inside the myFunction will scope out at the end of myFunction, I don't take care to delete/free them. Do you guys sense any bottlenecks ?

Solution

Since, the objects created inside the myFunction will scope out at the end of myFunction

No, they won't. This is not C++. All objects are created on the heap and remain in existence until garbage collectible.

Also, you haven't declared op_line anywhere in your pseudocode, so I assume it is being retained between method calls, and I guess that is your memory leak. I mean there is no way you should have a single character array consisting of > 100 million bytes, which is what the "small" heap dump says you have.