Search code examples
javafile-ioperformance

RandomAccessFile readInt


How to read numbers from a file? When I use the readInt method, I get a big number, and it is not equal to the number from the file. How to fix it?

Scanner is not good idea, because file contains more then 1000 millions numbers, it will take a very long time.

This is a text file. The file contains numbers divided by space symbols. for example ( test.txt )

1 2 4 -4004 15458 8876 
public static void readByMemoryMappedFile(int buffer[], String filename) throws IOException {
  int count = 0;
  RandomAccessFile raf = new RandomAccessFile(filename, "r");
  try {

    MappedByteBuffer mapFile = raf.getChannel().map(MapMode.READ_ONLY, 0, raf.length());

    StringBuilder b = new StringBuilder();
    try {
      while (mapFile.hasRemaining()) {
        byte read = mapFile.get();
        if (read == ' ' && b.length() > 0) {
          buffer[count++] = mapFile.getInt();//Integer.parseInt(b.toString());
          b.delete(0, b.length());
        } else {
          b.append((char) read);
        }
      }
    } catch (BufferUnderflowException e) {
      // Всё, файл закончился
    }
    if (b.length() > 0) {
      buffer[count++] = Integer.parseInt(b.toString());
    }
  } finally {
    raf.close();
  }
}

So, I attached a report:

```none
// operation: time
reading: 39719   // t0
reading: 28297   // t1
reading: 56719   // t2
reading: 125735  // t3
reading: 199000  // t4

t0 < t1 < t2 < t3 < t4

How to change behavior of my program that get this: t0 ~ t1 ~ t2 ~ t3 ~ t4?


Solution

  • If you want to randomly access data, you need to be able to determine where to start and where to finish. With a text format this can be tricky and you may have to read all the previous lines/text to find the one you want.

    With binary formats, you may be able to calculate exactly where you want to read, but you need to know how the number was encoded. e.g. was it big endian or little endian?

    Scanner may not be optimal for text, and useless for binary data, but if may be more than fast enough.

    Much of the time taken to scan a large file is the time it takes to read off disk (assuming it won't fit in memory) you can speed up this significantly if the file compresses well, e.g. text full of numbers does. Instead of taking 20 seconds to read it might take only 2 seconds if compressed. (And it might fit in the OS file cache)