java file-io fileinputstream bufferedinputstream

FileInputStream read byte by byte or block?

The reason why bufferedinputstream(BIS) is faster than FileInputStream(FIS) provided on Why is using BufferedInputStream to read a file byte by byte faster than using FileInputStream? is that

With a BufferedInputStream, the method delegates to an overloaded read() method that reads 8192 amount of bytes and buffers them until they are needed while FIS read the single byte

Per my understanding Disk is a 'block device'. The disk is always going to read/write entire blocks, even if the read request is for some smaller amount of data. Is n't it ? So how even both FIS and BIS will be reading complete block not single byte(as stated for FIS). Right ? So how BIS is faster than FIS ?

Solution

The java API of InputStream is what it is. Specifically, it has this method:

int read() throws IOException

which reads a single byte (it returns an int, so that it can return -1 to indicate EOF).

So, if you try to read a SINGLE BYTE from a file, it'll try to do that. In the case of a block device like a harddisk, that'll likely read the entire block, and then chuck everything except that one byte, so, if you call that read() method 8192 times, it reads the same block, over and over, 8192 times, each time chucking away 8191 bytes and giving you just the one you want. Thus, reading 67 million bytes in the entire process. Ouch. Not very efficient.

Given that the kernel, CPU, disk, etc all read in a block size of 8192, there is zero performance difference between a BufferedInputStream(new FileInputStream) and just the new FileInputStream, IF you use something like:

byte[] buffer = new byte[8192];
in.read(buffer);

Now even plain jane unbuffered new FileInputStream just ends up reading that block off of disk just once.

BufferedInputStream does that 'under the hood' even if you use the single-byte form of read(), and will then feed you data from that byte array for the next 8191 calls to read(). That's all BufferedInputStream does.

If you are using the read() (one byte at a time) variant (or the byte-array variant of read, but with really small byte arrays), then BufferedInputStream makes sense. Otherwise, that does nothing and there is no need to put that in there.

NB: As far as I know, java makes no guesses about what the disk buffer size is and just uses some reasonable buffer size. The effect is the same: If using single-byte-at-a-time, wrapping your filestream into a bufferedstream improves performance by a factor 1000+, if you are using the byte array variant, no difference whatsoever.