Search code examples
javainputstreambufferedreaderinputstreamreader

Reading very large text files in java


I'm using the following code to read large files:

InputStreamReader isr = new InputStreamReader(new FileInputStream(FilePath));
BufferedReader br = new BufferedReader(isr);
while ((cur = br.readLine()) != null)

I'm able to read large files using above code but I want to know how these readers works internally in memory. What role does inputstreamreader plays? How many chunks of memory gets allocated while reading a file(e.g 2 GB) line by line?


Solution

  • InputStreamReader is a facility to convert a raw InputStream (stream of bytes) to a stream of characters, according to some charset. FIleInputStream is a stream of bytes (it extends InputStream) from a given file. You can use InputStreamReader to read text, for instance, from a socket as well, as socket.getInputStream() also gives an InputStream.

    InputStreamReader is a Reader, the abstract class for a stream of characters. Using an InputStreamReader alone would be inefficient, as each "readLine" would actually read from the file. When you decorate with a BufferedReader, it will read a chunk of bytes and keep it in memory, and use it for subsequent reads.

    About the size: the documentation does not state the default value:

    https://docs.oracle.com/javase/7/docs/api/java/io/BufferedReader.html

    The buffer size may be specified, or the default size may be used. The default is large enough for most purposes.

    You must check the source file to find the value.

    https://github.com/openjdk-mirror/jdk7u-jdk/blob/master/src/share/classes/java/io/BufferedReader.java

    This is the implementation in the OpenJDK:

     private static int defaultCharBufferSize = 8192;
    

    The Oracle's closed source JDK implementation may be different.