Search code examples
javaperformanceniomappedbytebuffer

Why is this "line count" program slow in Java? Using MappedByteBuffer


To try MappedByteBuffer (memory mapped file in Java), I wrote a simple wc -l (text file line count) demo:

int wordCount(String fileName) throws IOException {
    FileChannel fc = new RandomAccessFile(new File(fileName), "r").getChannel();
    MappedByteBuffer mem = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size());

    int nlines = 0;
    byte newline = '\n';

    for(long i = 0; i < fc.size(); i++) {
        if(mem.get() == newline)
            nlines += 1;
    }

    return nlines;
}

I tried this on a file of about 15 MB (15008641 bytes), and 100k lines. On my laptop, it takes about 13.8 sec. Why is it so slow?

Complete class code is here: http://pastebin.com/t8PLRGMa

For the reference, I wrote the same idea in C: http://pastebin.com/hXnDvZm6

It runs in about 28 ms, or 490 times faster.

Out of curiosity, I also wrote a Scala version using essentially the same algorithm and APIs as in Java. It runs 10 times faster, which suggests there is definitely something odd going on.

Update: The file is cached by the OS, so there is no disk loading time involved.

I wanted to use memory mapping for random access to bigger files which may not fit into RAM. That is why I am not just using a BufferedReader.


Solution

  • The code is very slow, because fc.size() is called in the loop.

    JVM obviously cannot eliminate fc.size(), since file size can be changed in run-time. Querying file size is relatively slow, because it requires a system call to the underlying file system.

    Change this to

        long size = fc.size();
        for (long i = 0; i < size; i++) {
            ...
        }