Search code examples
javaio

Is FileInputStream Really Unbuffered?


FileInputStream uses this native method to read bytes in a user supplied buffer. As we can see in the implementation, a user-space buffer is indeed allocated with length max(BUF_SIZE, supplied buffer length). This buffer is then passed to IO_Read. I could not pull up the implementation of this method but I am pretty sure this is the standard C buffered reading. Why is FileInputStream then said to be unbuffered? And what additional buffering is BufferedReader using?

Edit: Benchmark comparing read times of a 100M file and its results:

Benchmark                         Mode  Cnt      Score      Error  Units
StreamIOBenchmark.bisMultiBytes   avgt   30     27.640 ±    1.117  ms/op
StreamIOBenchmark.bisSingleBytes  avgt   30    400.552 ±   26.921  ms/op
StreamIOBenchmark.fisMultiBytes   avgt   30     25.231 ±    1.459  ms/op
StreamIOBenchmark.fisSingleBytes  avgt   30  97991.213 ± 5620.685  ms/op

The real difference is while reading single bytes, as FIS will be making a new system call for each byte. Hoewver, When reading 8k bytes at once, both FIS (fisMultiBytes) and BIS (bisMultiBytes) perform pretty similarly.


Solution

  • Is FileInputStream really unbuffered?

    Yes. Really.

    I could not pull up the implementation of this method but I am pretty sure this is the standard C buffered reading.

    No it isn't. The native code is NOT using C buffered I/O. It is performing a read(2) call into a temporary buffer. Then it copies the data from the temporary buffer to the caller supplied byte[].

    In the Java 11u code I am looking at, the temporary buffer is either a small on-stack buffer or a larger native heap buffer that is malloc'd and free'd during the native call.

    The relative pathname in the OpenJDK Java 11u codebase is ./src/java.base/share/native/libjava/io_util.c. Look for the readBytes method. (This is called via a native entry point method in ./src/java.base/share/native/libjava/FileInputStream.c)

    The IO_Read in io_utils.c is a macro that aliases handleRead which wraps the read(2) call.

    UPDATE

    I just noticed that you had already found a link to the Java 8u version of io_util.c. It behaves the same way as the 11u code that I described above. So maybe you just misread the C code you found?


    Why is FileInputStream then said to be unbuffered?

    It is "said to be" unbuffered because it really is unbuffered.

    And what additional buffering is BufferedReader using?

    You can see that by looking at the (Java) source code for BufferedReader.


    Hint: rather than making (incorrect) assumptions about how the JVM native code implementation works, download and read the OpenJDK source code. It is available for anyone with enough disk space (1.2GB) to download.