Search code examples
javastreameofdatainputstream

If -1 is the standard EOF byte, why does DataInputStream do this?


The readInt() function from java.io.DataInputStream is as follows:

public final int readInt() throws IOException {
    int ch1 = in.read();
    int ch2 = in.read();
    int ch3 = in.read();
    int ch4 = in.read();
    if ((ch1 | ch2 | ch3 | ch4) < 0)
        throw new EOFException();
    return ((ch1 << 24) + (ch2 << 16) + (ch3 << 8) + (ch4 << 0));
}

In the function, you can see that it throws an EOFException (end of file exception) when (ch1|ch2|ch3|ch4)<0. But I was under the influence that the standard EOF byte was -1. (That is, 255, 0xFF, 0b11111111, or whatever notation you prefer...) This function, however, only checks to see if any of the bytes are negative. So... What's going on here?


Solution

  • EOF is -1 (int), int being 32 bits. Thus 255 (byte) is not equal to -1 (int).

    So what is the readInt() method doing?

    int ch1 = in.read();
    int ch2 = in.read();
    int ch3 = in.read();
    int ch4 = in.read();
    

    These lines are reading in the raw bytes, as ints. 8 bits of the int are reserved for the original byte. As there is not room within the byte to represent NO VALUE, an extra bit from the int range is used. The negative bit.

    So these lines optimistically read in all four bytes, that will be combined together to create the final int value. This target int value is not to be confused with the int representation of each individual byte.

     if ((ch1 | ch2 | ch3 | ch4) < 0)
    

    is an efficient way to check that all four bytes that were read in existed. The alternative would be to read one byte, check that, then read the next and test that. Having one branch is more efficient than having four for modern CPUs.

    return ((ch1 << 24) + (ch2 << 16) + (ch3 << 8) + (ch4 << 0));
    

    After the if check has failed, we know that each int representation of a byte is positive and thus 24 extra bits within the int that are not used by the byte value (32-8 bits) are all zero. Thus we shift the bits along into the correct places and combine them; giving us the final value.