Search code examples
javafileheap-memoryspace

Java heap space error at large files with string.split


I have in the line a heap space error on an another machine, but it runs on my machine I can't chance the properties of the another machine. How can I solve this problem without using Scanner.java ?

Is the argument of string.split correct with " " for split after spaces to split the String in pieces?

[File:]

U 1 234.003 30 40 50 true
T 2 234.003 10 60 40 false
Z 3 17234.003 30 40 50 true
M 4 0.500 30 40 50 true

/* 1000000+ lines */
java.lang.OutOfMemoryError: Java heap space
    at java.base/java.util.Arrays.copyOfRange(Arrays.java:3821)
    at java.base/java.lang.StringLatin1.newString(StringLatin1.java:764)
    at java.base/java.lang.String.substring(String.java:1908)
    at java.base/java.lang.String.split(String.java:2326)
    at java.base/java.lang.String.split(String.java:2401)
    at project.FileR(Fimporter.java:99)
public static DataBase File(String filename) throws IOException {

   BufferedReader fs = new BufferedReader(new FileReader(filename),64 * 1024);

   String line;
   String[] wrds;
   String A; int hash; double B; int C; int D; boolean E; DataBase DB = new DataBase();

   while (true) {

        line = fs.readLine();
        if (line == null) {break;}
        wrds = line.split(" ");     /* this is line 99 in the error-message */

        hash  = Integer.parseInt(wrds[1]); 
        B     = Double.parseDouble(wrds[2]);
        C     = Integer.parseInt(wrds[3]); 
        D     = Integer.parseInt(wrds[4]); 
        E     = Boolean.parseBoolean(wrds[5]); 

        // hash is hashcode for all values B C D E in DataBase DB

        DB.listB.put(hash,B);
        DB.listC.put(hash,C);
        DB.listD.put(hash,D);
        DB.listE.put(hash,E);

   }


Solution

  • How can I solve this problem without using Scanner.java ?

    Scanner is not the issue.

    If you are getting OOME's with this code, the most likely root cause is the following:

    DB.listB.put(hash,B);
    DB.listC.put(hash,C);
    DB.listD.put(hash,D);
    DB.listE.put(hash,E);
    

    You appear to loading all of your data into 4 maps. (You haven't shown us the relevant code ... but I am making an educated guess here.)

    My second guess is that your input files are very large, and the amount of memory needed to hold them in the above data structures is simply too large for the "other" machine's heap.

    The fact that the OOME's are occurring in a String.split call is not indicative of a problem in split per se. This is just the proverbial "straw that broke the camel's back". The root cause of the problem is in what you are doing with the data after splitting it.


    Possible solutions / workarounds:

    1. Increase the heap size on the "other" machine. If you haven't set the -Xmx or -Xms options, the JVM will use the default max heap size ... which is typically 1/4 of the physical memory.

      Read the command documentation for the java command to understand what -Xmx and -Xms do and how to set them.

    2. Use more memory efficient data structures:

      • Create a class to represent a tuple consisting of B, C, D, E values. Then replace the 4 maps with a map of these tuples.

      • Use a more memory efficient Map type.

      • Consider using a sorted array of tuples (including the hash) and using binary search to look them up.

    3. Redesign your algorithms so that they don't need all of the data in memory at the same time; e.g. split the input into smaller files and process them separately. (This may not be possible ....)