Search code examples
javajsonperformanceperformance-testing

Increase the performance of reading


I am reading various files, each containing around 10,000,000 lines. The first few files are read quickly but the performance degrades at about the 7th file. In fact, it is so inefficient that I had to use -XX:-UseGCOverheadLimit

HashMap<String,String> hm = new HashMap();
File dir2 = new File(direc);
File[] directoryListing2= null;

   directoryListing2 = dir2.listFiles();

  if (directoryListing2 != null) {

    for (File child2 : directoryListing2) {
        BufferedReader br2= null;   

        br2 = new BufferedReader(new FileReader(child2));

        String line2=null;

            while ((line2 = br2.readLine()) != null) {
                if(!(line2.isEmpty())){


                    JSONObject thedata = new JSONObject(line2);

                         String name = (String)thedata.get("name");
                         String surname = (String)thedata.get("surname");
                         hm.put(name, surname);

                     }
                }
            br2.close();

            }

    }

Why does the performance degrade so much and how can I make it more efficient?


Solution

  • You are inserting 10 millions entries in your map - each entry is using at least 28 bytes (assuming a surname of one character), more if the surname is longer.

    28 is a rough estimate: 4 bytes for each string pointer = 8 bytes, 16 bytes for the 1 character string, 4 bytes for the reference to the entry in the map - it may take more but that gives an order of magnitude

    So each file reading is using at least 280MB of heap. And you are doing it 7 times => 2GB. And that was assuming all the values are one character long - I imagine they are not.

    You need to have a maximum heap size that is large enough otherwise that code will put a lot of pressure on the garbage collector and will possibly run out of memory.

    As mentioned in the comments you could also presize the map to avoid too much rehashing.