Search code examples
rubyflat-file

Ruby: start reading at arbitrary point in large file


I have some log files I would like to sift through. The content is exactly what you expect in a log file: many single lines of comma separated text. The files are about 4 gigs each. File.each_line or foreach takes about 20 minutes for one of them.

Since a simple foreach seems... simple (and slow), I was thinking that two separate threads might be able to work on the same file if I could only tell them where to start. But based on my (limited) knowledge, I can't decide if this is even possible.

Is there a way to start reading the file at an arbitrary line?


Solution

  • For lines, it might be a bit difficult, but you can seek within a file to a certain byte.

    IO#seek (link) and IO#pos (link) will both allow you to seek to a given byte within the file.