Search code examples
javaperformancefilesplitbufferedreader

Which is more efficient to get every nth line in text file? Iterate with Java's BufferedReader, or split into subfiles then take top line of each?


I have a very large data set, and I want the fastest way to get every nth line (for example, if the file is 1M lines long, I'd want every 1000th line).

Ideally I'm looking for a way to jump to each line number, but I haven't found a way to do that yet.

My work around is to split the original data file (using the Unix "split" command) then take the top line of each.

I'm curious if there is a way to jump to a specific line number in Java without iterating through other lines in the file. If not, is it more efficient to split the file, or use BufferedReader until I get to my desired line?

Any help is greatly appreciated!


Solution

  • Spiltting into subfiles has nothing to recommend it. It adds latency and wastes space. It's the same work as your first solution plus more.

    You can read millions of lines a second with BufferedReader. Do it the simple way. Use a LineNumberReader, which extends BufferedReader, and read lines until the line count is the one(s) you want.