Search code examples
javajava-iobufferedinputstream

Performance comparison: BufferedInputStream vs. Unbuffered Streams in Java


I created a file with 1,000,000 zeroes in it (980K) and measure how fast to copies it to another file (using buffered and non-buffered for both input and output) with default buffer size (8192 bytes). Surprisingly, buffering both input and output significantly improved performance. this is the result

While I understand that large file reads that exceeding the buffer size (8KB) might negate buffering benefits, I don't understand why the performance from using the buffer is still better than not using one (At least it takes less syscall). The only downside I see here seems to be unnecessary memory usage for the buffer itself.

Question is: 1.Except in memory-constrained environments where memory usage is a concern, why don't we always use buffer for both input and output?

  1. Why is it faster to copy a file when the output is buffered?

My code includes functions like t1(), t2(), t3(), and t4() which I will not share in full here since they are similar to the code provided:

static void t2() {
        System.out.println("Copy file using BufferedInputStream (8192), with BufferOut");
        long startTime = System.nanoTime();

        try (BufferedInputStream br = new BufferedInputStream(new FileInputStream("text.txt"));
             var bOut = new BufferedOutputStream(new FileOutputStream("text_3.txt"))) {
            int i;
            while ((i = br.read()) != -1) {
                bOut.write(i);
            }

        } catch (IOException e) {
            System.out.println(e);
        }
        long endTime = System.nanoTime();
        long duration = (endTime - startTime) / 1000000;

        System.out.println("Duration: " + duration + " millisecond");
    }

I've tried using in.transferTo with both buffered input&output and the result is even better (1 digit millisecond). I also plan to experiment with different buffer sizes. I read that the optimal size depends on the file system block size, which in my case is 4,096 bytes. Should I set the buffer size to 4,096, or is it the bigger the better?


Solution

  • The is the low-level operating system I/O. Even when writing byte wise in java there probably is some non-java memory buffer.

    Using non-buffered java I/O will have a lot of calls from java to the OS level that then sporadically transfers a buffer. This is very costly.

    Using buffered java I/O is considerably faster copying the java buffer in one single native call to the OS level and then does the actual I/O (assuming same sized buffers).

    You are right this takes a bit of memory for the java buffer.

    There is a third alternative to buffered and immediate non-buffered I/O. This uses memory mapped files. This operates straight on the OS level buffer (creates, reads, writes). MappedByteBuffer. Search how to use a FileChannel and ByteBuffer and the code is not really harder to write.

    Now it seems you did not mention the great Files class. Using the more general Path instead of File you can do all those I/O easily: newByteChannel.

    P.S.

    One should mention long InputStream#transferTo(OutputStream) (which can work asynchrone without much blocking between input and output) but then Files.copy exists too. I gather for understanding the underlying mechanism you used the more basic coding.