Search code examples
javafileiorandom-accessrandomaccessfile

randomAccessFile.readLine() returns null after many uses even though not reaching EOF?


I have a file with 10K lines.

I read it in chunks of 200 lines.

I have a problem that after 5600 lines (chunk 28), randomAccessFile.readLine() returns null.

however, if i start reading from chunk 29 it reads another chunk and stops ( return null).

I force reading from chunk 30, and again - it reads one chunk and stops.

this is my code:

private void addRequestsToBuffer(int fromChunkId, List<String> requests) {
    String line;
    while (requests.size() < chunkSizeInLines) {

        if ((line = readNextLine()) != null) {
            return;
        }
        int httpPosition = line.indexOf("http");
        int index = fromChunkId * chunkSizeInLines + requests.size();
        requests.add(index + ") " + line.substring(httpPosition));
    }


}

private String readNextLine() {
    String line;
    try {
        line = randomAccessFile.readLine();
        if (line == null) {
            System.out.println("randomAccessFile.readLine() returned null");
        }

    } catch (IOException ex) {
        ex.printStackTrace();
        throw new RuntimeException(ex);
    }
    return line;
}


@Override
public List<String> getNextRequestsChunkStartingChunkId(int fromChunkId) {
    List<String> requests = new ArrayList<>();
    int linesNum = 0;
    try {
        for (int i = 0; i < fromChunkId; i++) {
            while ((linesNum < chunkSizeInLines) && (randomAccessFile.readLine()) != null) {
                linesNum++;
            }
            linesNum = 0;
        }
        addRequestsToBuffer(fromChunkId, requests);
    } catch (IOException ex) {
        ex.printStackTrace();
        throw new RuntimeException(ex);
    }
    return requests;
}

what can cause this? randomAccessFile time out?


Solution

  • Each time you call getNextRequestsChunkStartingChunkId you're skipping the specified number of chunks, without "rewinding" the RandomAccessFile to the start. So for example, if you call:

    getNextRequestsChunkStartingChunkId(0);
    getNextRequestsChunkStartingChunkId(1);
    getNextRequestsChunkStartingChunkId(2);
    

    you'll actually read:

    • Chunk 0 (leaving the stream at the start of chunk 1)
    • Chunk 2 (leaving the stream at the start of chunk 3)
    • Chunk 5 (leaving the stream at the start of chunk 6)

    Options:

    • Read the chunks sequentially, without skipping anything
    • Rewind at the start of the method

    Unfortunately you can't use seek for this, because your chunks aren't equally sized, in terms of bytes.