Search code examples
javafile-iobufferedreaderrandomaccessfile

Java Random File Access: Get byte offset of line start


I need to randomly access specific records in a text (ASCII) file and then read from there until a specific "stop sequence" (record delimiter) is found. The file contains multi-line records and each record is separated by the delimiter. Each record also takes a different amount of lines! This is a commonly known file format in the specific area of expertise and can not be changed.

I want to index the file so I can quickly jump to a requested record.

In similar questions like

How to Access string in file by position in Java

and links in it, answer always reference the seek() method of various classes like RandomAccessFile. I know about that!

The issue I have is how to get the offset needed for seek! (indexing the file)

BufferedReader does not have a getFilePointer() method or any other to get the current byte offset from start of file. RandomAccessFile has a readLine() method but it's performance is beyond terrible. It's not usable at all for my case.

I would need to read the file line by line and each time the record delimiter is found I need to get the byte offset. How can I achieve this?


Solution

  • After a lot of further googling, trial and error and more I came up with a solution that simply wraps RandomAccessFile and exposes all methods. The readLine() method however was much improved by talking the one from BufferedReader with minor adjustments. Performance is now identical to it.

    This so called class OptimizedRandomAccessFile buffers readLine() calls as long as no other methods requiring or affecting the position in the file are called. eg in:

    OptimizedRandomAccessFile raf = new OptimizedRandomAccessFile(filePath, "r");
    String line = raf.readLine();
    int nextByte = raf.read();
    

    nextByte will contain the first byte of the next line in the file.

    The full code can be found on bitbucket.