Search code examples
javaexternal-sorting

How to read a file in chunks that is to large to be stored in memory


I'm practicing and I ran across a problem about sorting numbers from a file that is to large to fit in memory. I don't know how to do this so I thought I would give it a try. I ended up finding external sort, and I'm basically just trying to take the concept and code a solution to this problem. The text file that I'm practicing with is not that large to fit into memory; I'm just trying to to learn how to accomplish something like this. So Far I am reading from the file in 3 chunks of 500 lines each, sorting the chunks, and then writing the results chunks to their own file. This is working... although I'm not sure my implementation is how the external sort process is intended to be implemented:

import java.util.*;
import java.io.*;

public class ExternalSort{

 public static void main(String[] args) {
    File file = new File("Practice/lots_of_numbers.txt");
    final int NUMBER_OF_CHUNKS = 3;
    final int AMOUNT_PER_CHUNK = 500;
    int numbers[][] = new int[NUMBER_OF_CHUNKS][AMOUNT_PER_CHUNK];

    try{
     Scanner scanner = new Scanner(file);

     for(int i = 0; i < NUMBER_OF_CHUNKS; i++){
       //Just creating a new file name for each chunk
       StringBuilder sortedFileName = new StringBuilder().append("sortedFile").append(i).append(".txt");

       for(int j = 0; j < AMOUNT_PER_CHUNK; j++){
         numbers[i][j] = Integer.parseInt(scanner.nextLine());
       }
       Arrays.sort(numbers[i]);
       saveResultsToFile(sortedFileName.toString(),numbers[i]);
     }

       scanner.close();
    }catch(FileNotFoundException e){
     System.out.println("Error: " + e);
    }
  }

public static void saveResultsToFile(String fileName, int arr[]){
   try{
     File file = new File(fileName); 
     PrintWriter printer = new PrintWriter(file);

     for(int i : arr)
       printer.println(i);

     printer.close(); 
   }catch(FileNotFoundException e){
     System.out.println("Error :" + e);
   }

 }

}

My question is how am I supposed to break up a file into chunks? I happen to know exactly how many lines of text my file has because I created it, so it was easy to write this code...BUT the problem actually tells you the size of the file; as in memory, not how many LINES of text the file. I'm uncertain how to break up the data into "chunks of memory"(and how to size them) instead of lines of text. Also, if there is anything weird about my code, wrong, or bad practice PLEASE tell me, as I honestly don't know what I'm doing; I'm just trying to learn. As far as merging the sorted files back together, I don't know how to do that either, but I have an idea. I would like to try it before I ask for help on that part. Thanks!


Solution

  • This is how to get the size of the chunks that we want to break the file into:

    public static long chunkSize(File file){
      //We don't want to create more that 1024 temp files for sorting
      final long MAX_AMOUNT_OF_TEMP_FILES = 1024;
      long fileSize = file.length();
      long freeMemory = Runtime.getRuntime().freeMemory();
    
      //We want to divide the file size by the maximum amount of temp files we will use for sorting
      long chunkSize = fileSize / MAX_AMOUNT_OF_TEMP_FILES;
    
      //If the block size is less than half the available memory, then we can stand to make the block size larger
      if(chunkSize < freeMemory / 2)
         chunkSize = freeMemory / 2;
      else
         System.out.println("Me may potentially run out of memory");
    
      return chunkSize ;
    
    }