Search code examples
javastreamfilereaderbufferedinputstream

What is Best File Reader in between BufferInputReader Vs LineNumberReader vs Stream in Java in terms of Memory, Processor, Time


I tried all three reading process but cant judge which is best in term of

Memory Utilization, Processor Usage, Time complexity

I have seen many solutions online but no one has come up with perfect conclusion on above terms.

I have tried few thing please check the code and let me know how to make it more optimize in above highlighted requirement.

Below is my Code.

NOTE: Out.txt is 3Gb text file

package Reader;

import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.FileReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.LineNumberReader;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.stream.Stream;

/*
 *  Comparing Execution time of BufferInputReader Vs LineNumberReader Vs 
Stream
 *  o/p > Effeciency of BufferInputReader to LineNumberReader is around :: 
200%

 *  
 */
public class LineReaderBufferInputStream {

public static void main(String args[]) throws IOException {
    //LineReaderBufferInputStream
    LineReaderBufferInputStream lr = new LineReaderBufferInputStream();
    long startTime = System.nanoTime();

    int count = lr.countLinesUsingLineNumberReader("D://out.txt");

    long endTime = System.nanoTime();
    long c1 = (endTime - startTime);
    System.out.println(count + " LineReaderBufferInputStream Time taken:: " + c1);

    startTime = System.nanoTime();

    count = countLinesByBufferIpStream("D://out.txt");

    endTime = System.nanoTime();
    long c2 = (endTime - startTime);
    System.out.println(count + " BufferedInputStream Time taken:: " + c2);

    System.out.println("Effeciency of BufferInputReader to LineNumberReader is around :: " + (c1) / c2 * 100 + "%");

    // Java8 line by line reader
    //read file into stream, try-with-resources
    startTime = System.nanoTime();
    long cn = countLinesUsingStream("D://out.txt");
    endTime = System.nanoTime();

    System.out.println(cn +" Using Stream :: " + (endTime - startTime));

}

public int countLinesUsingLineNumberReader(String filename) throws IOException {
    LineNumberReader reader = new LineNumberReader(new FileReader(filename));
    int cnt = 0;
    String lineRead = "";
    while ((lineRead = reader.readLine()) != null) {
        //if you need to do anything with lineReader.
    }

    cnt = reader.getLineNumber();
    reader.close();
    return cnt;
}

public static int countLinesByBufferIpStream(String filename) throws IOException {
    InputStream is = new BufferedInputStream(new FileInputStream(filename));
    try {
        byte[] c = new byte[1024];
        int count = 1;
        int readChars = 0;
        boolean empty = true;
        while ((readChars = is.read(c)) != -1) {
            empty = false;
            for (int i = 0; i < readChars; ++i) {
                if (c[i] == '\n') {
                    ++count;
                }
            }
        }
        return (count == 0 && !empty) ? 1 : count;
    } finally {
        is.close();
    }
}

public static long countLinesUsingStream(String fileName) throws IOException{
    try (Stream<String> streamReader = Files.lines(Paths.get("D://out.txt"))) {

        return streamReader.count();

    } catch (IOException e) {
        e.printStackTrace();
    }
    return 0;
}

}


Solution

  • One remark: it is good to explicitly pass the encoding of a portable file, as the default encoding may vary.

    The older default encoding for binary file data to Unicode String was the platform encoding.

    The newer Files.lines will use UTF-8 by default (hurray).

    This means that UTF-8 is a bit more slow conversion, and error prone on wrong non-ASCII chars, as UTF-8 multibyte sequences require a correct bit format.

    1. In general Files.lines and others like Files.newBufferedReader are sufficiently fast.

    2. For huge files one might use a ByteBuffer/CharBuffer, a memory mapped file via a FileChannel. Simply search on the net. The gain is not that large.

    Not converting, using a (Buffered)InputStream / ByteBuffer, is faster than conversion to text.

    Java stores (Unicode) text in a String as array of char which is 2-byte. The newest java can store it alternatively also in single byte encoding (a jvm option), which might save memory.

    Possibly better might be to have the text compressed, as Out.txt.gz for instance. Trading CPU against disk speed.