I have a huge text file (207 MB, 4 million lines) and I need to read it sequentially line by line.
Every line has this format:
20227993821NAME AND SURNAME NINIC NN08
I was using (for regular files) the Java library's FileReader
and BufferedReader
like this:
FileReader dataFile = new FileReader(directory);
data = new BufferedReader(dataFile);
String s;
while((s = data.readLine()) != null){
//do stuff
}
with no problems, but with huge files it takes too much time to process.
I wonder what would be the best practice in such cases (another library, different methods, etc.), anything would be helpfull.
The file is issued periodically by a government agency and it must be loaded in to my software for data comparison.
Edit:
This code:
BufferedReader data = new BufferedReader(new FileReader(file));
String s;
int count = 0;
while ((s = data.readLine()) != null) {
System.out.println (count + " - " + s);
count++;
}
data.close();
executed in 19 minutes 30 seconds. I don't know why it took so long.
I have a 64 bit operative system and a i5 processor.
If I run
File file = new File("/tmp/deleteme.txt");
file.deleteOnExit();
long start = System.nanoTime();
PrintWriter pw = new PrintWriter(file);
for (int i = 0; i < 4 * 1000 * 1000; i++)
pw.println("01234567890123456789012345678901234567890123456789");
pw.close();
long mid = System.nanoTime();
BufferedReader data = new BufferedReader(new FileReader(file));
String s;
while ((s = data.readLine()) != null) {
//do stuff
}
data.close();
long end = System.nanoTime();
System.out.printf("Took %.3f seconds to write and %.3f seconds to read a %.2f MB file.%n",
(mid - start) / 1e9, (end - mid) / 1e9, file.length() / 1e6);
it prints
Took 0.465 seconds to write and 0.522 seconds to read a 204.00 MB file.
EDIT: If I print out each line, it slows down dramatically because writing to the screen take a long time. I have found the MS-DOS window to be especially slow.
Took 0.467 seconds to write and 10.254 second to read a 204.00 MB file.
I don't believe its the reading of the file which is taking too long, it is what you are doing with it that is taking a long time.