Search code examples
javainputstreamb-tree

can you tell me what those 2 methods which reads bytes from an Input stream do?


I am trying to understand an implementation of a b+ Tree. I don't understand what this overloaded method exactly do. Why in the first method having Inputsteam is as an argument declare 4 variables which are i1,i2,i3 and i4. In the second method using ObjectInput in as an argument, i understand that it returns a byte from 0 to 255, why is result=251? It will be helpful to explain each line and what it do.

First method:

    public final static int readLuposInt(final InputStream is) throws IOException {
        final int i1 = is.read();
        if (i1 < 0) {
            return i1;
        }
        final int i2 = is.read();
        if (i2 < 0) {
            return i2;
        }
        final int i3 = is.read();
        if (i3 < 0) {
            return i3;
        }
        final int i4 = is.read();
        if (i4 < 0) {
            return i4;
        }
        return (0xFF & i1) | ((0xFF & i2) | ((0xFF & i3) | (0xFF & i4) << 8) << 8) << 8;

}

overloaded method:

public final static int readLuposInt(final ObjectInput in) throws IOException {
        final int i0 = in.read();
        if (i0 <= 251){
            return i0;
        }
        int result = 251;
        int offset = 1;
        for (int i = 1; i <= i0 - 251; i++) {
            result += in.read() * offset;
            offset <<= 8;
        }
        return result;
    }

Solution

  • You could have used a debugger to find the following result.

    The first method reads an 4 byte integer from an input stream. It seems to be stored as little-endian value.

    • the bytes are read in sequentially
    • ff any of the bytes is missing, -1 is returned.
    • to return the complete integer, a computation is made by shifting the more significant bytes to the left.

    Example:

    • The number 2293742 represents the hex number 22 FF EE, which will be stored in reverse order: 0xEE 0xFF 0x22 0x00
    • now the data gets read
      • i1 = 0xEE
      • i2 = 0xFF
      • i3 = 0x22
      • i4 = 0x00
    • now the return value is computed:
      • (0xFF & i4) << 8 = (0xFF & 0x00) << 8 = 0x0000
      • ((0xFF & i3) | (0xFF & i4) << 8) << 8) = ((0x22 | 0x0000) << 8) = (0x0022 << 8) = 0x002200
      • ((0xFF & i2) | ((0xFF & i3) | (0xFF & i4) << 8) << 8) << 8 = (0xFF | 0x002200) << 8 = 0x0022FF00
      • (0xFF & i1) | ((0xFF & i2) | ((0xFF & i3) | (0xFF & i4) << 8) << 8) << 8 = 0xEE | 0x0022FF00 = 0x0022FFEE

    The second method reads unicode characters from the stream, encoded in an UTF-8 encoding. Much can be said about unicode and their character encodings, see Wikipedia how that is working.