I am reading a huge text file of words (one word per line) but I have to stop it from time to time to resume the read the next day. Right now I'm using Apache's lineiterator but it's totally the wrong solution. My file is 7Gb and I had to interrupt reading it around at 1Gb. To resume the read I saved the number of line already read. This means that I have an if statement on the while loop. Apache's FileUtils doesn't allow to seek so that was my solution.
What is the best/fastest solution? I thought to use RandomAccessfile to get to the right line and continue reading, but I'm not sure if I can go to the right place AND how do I save the correct place I read last. I can reead again a couple of lines, so the precision is not so important, but I haven't found a way to get the pointer. I have a BufferedReader to read the File and a RandomAccessFile to seek to the right place, but I don't know how to periodically save a position with the BufferedReader. Any hints?
Code: (note the "SOMETHING" where I should print the value I can use on the seekToByte )
try {
RandomAccessFile rand = new RandomAccessFile(file,"r");
rand.seek(seekToByte);
startAtByte = rand.getFilePointer();
rand.close();
} catch(IOException e) {
// do something
}
// Do it using the BufferedReader
BufferedReader reader = null;
FileReader freader = null;
try {
freader = new FileReader(file);
reader = new BufferedReader(freader);
reader.skip(startAtByte);
long i=0;
for(String line; (line = reader.readLine()) != null; ) {
lines.add(line);
System.out.print(i+" ");
if (lines.size()>1000) {
commit(lines);
System.out.println("");
lines.clear();
System.out.println(SOMETHING?);
}
}
} catch(Exception e) {
// handle this
} finally {
if (reader != null) {
try {reader.close();} catch(Exception ignore) {}
}
}
RandomAccessfile
is indeed one way to go. Use
long position = file.getFilePointer();
When you stop reading to save where you are in the file, and then restore with:
file.seek(position);
To resume reading at the same place.
However, be careful when using RandomAccessfile
, as its readLine
method does not completely support Unicode.