Search code examples
javaio

Java Read Large Text File With 70million line of text


I have a big test file with 70 million lines of text. I have to read the file line by line.

I used two different approaches:

InputStreamReader isr = new InputStreamReader(new FileInputStream(FilePath),"unicode");
BufferedReader br = new BufferedReader(isr);
while((cur=br.readLine()) != null);

and

LineIterator it = FileUtils.lineIterator(new File(FilePath), "unicode");
while(it.hasNext()) cur=it.nextLine();

Is there another approach that can make this task faster?


Solution

  • 1) I am sure there is no difference speedwise, both use FileInputStream internally and buffering

    2) You can take measurements and see for yourself

    3) Though there's no performance benefits I like the 1.7 approach

    try (BufferedReader br = Files.newBufferedReader(Paths.get("test.txt"), StandardCharsets.UTF_8)) {
        for (String line = null; (line = br.readLine()) != null;) {
            //
        }
    }
    

    4) Scanner based version

        try (Scanner sc = new Scanner(new File("test.txt"), "UTF-8")) {
            while (sc.hasNextLine()) {
                String line = sc.nextLine();
            }
            // note that Scanner suppresses exceptions
            if (sc.ioException() != null) {
                throw sc.ioException();
            }
        }
    

    5) This may be faster than the rest

    try (SeekableByteChannel ch = Files.newByteChannel(Paths.get("test.txt"))) {
        ByteBuffer bb = ByteBuffer.allocateDirect(1000);
        for(;;) {
            StringBuilder line = new StringBuilder();
            int n = ch.read(bb);
            // add chars to line
            // ...
        }
    }
    

    it requires a bit of coding but it can be really faster because of ByteBuffer.allocateDirect. It allows OS to read bytes from file to ByteBuffer directly, without copying

    6) Parallel processing would definitely increase speed. Make a big byte buffer, run several tasks that read bytes from file into that buffer in parallel, when ready find first end of line, make a String, find next...