Search code examples
javagarbage-collectionjava.util.scannerheap-memory

Reading in large text files, problems with the garbage collector and Scanner object


I am writing a program that needs to read in very large files (about 150Mb of text). I am running into an out of memory error when I try to read in files that are larger than 50Mb. Here is an excerpt from my code.

if (returnVal == JFileChooser.APPROVE_OPTION) {
        file = fc.getSelectedFile();
        gui.setTitle("Fluent Helper - " + file.toString());
        try{
            scanner = new Scanner(new FileInputStream(file));
            gui.getStatusLabel().setText("Reading Faces...");
            while(scanner.hasNext()){
                count++;
                if(count<1000000){
                    System.gc();
                    count = 0;
                }
                readStr = scanner.nextLine()+ "\n";
                if(readStr.equals("#\n")){
                    isFaces = false;
                    gui.getStatusLabel().setText("Reading Cells...");
                }else if(isFaces){
                    faces.add(new Face(readStr));
                }else{
                    cells.add(new Cell(readStr));
                }
            }
        }catch (Exception e){
            e.printStackTrace();
        }finally{
            try{
                scanner.close();
            }catch(Exception e){
                e.printStackTrace();
            }
        }
        System.out.println("flie selected");
    } else {
        System.out.println("file not selected");
    }

the small block that calls the garbage collector every arbitrary number of reads is something I added to solve the memory problem, but it doesn't work. Instead the program hangs and never gets to the cells portion of the file (which should happen in less than a second). Here is the block.

                    if(count<1000000){
                    System.gc();
                    count = 0;
                }

My guess is that maybe the Scanner's pointer is getting garbage collected or something. I really don't have any clue. Launching the program with a larger heap is not really an option for me. The program should be usable by people with out very much computer knowledge.

I would like a solution to get the file in with out a problem, be it a memory management one or fixing the scanner or a more efficient means of reading the file. Thanks everyone.


Solution

  • The GC will be called automatically when required so calling it yourself will just slow down your application.

    The problem is the amount of data you are retaining

                    faces.add(new Face(readStr));
                }else{
                    cells.add(new Cell(readStr));
    

    These are exceeding the amount of memory you have as a maximum heap. Can you try setting -mx1g to see if this makes a difference?

    BTW: Why are you adding a \n to the end of each line?