Search code examples
scalabit-manipulationbyteunsigned

Unsigned shift right in Scala


I have Int class, I split it into 4 bytes using ByteBuffer.allocate(4).order(ByteOrder.BIG_ENDIAN).putInt(word).array(). Now, I would like to map through values of that bytes as bytes.map(w => w.toInt/16). Here I would like to have values being divided by 16 and get values in range of [0, 16] (as w is 1 byte). However, if value of w is something like 0xcc, I am getting -4 instead of 12.

I am currently using (BigInt(Array[Byte](0, w))/16).toInt but feel like it is quite slow and not idiomatic way of writing Scala code.

I tried using w >>> 4 but the result is some large integer.


Solution

  • In your case w >>> 4 is the right answer but by printing numbers as signed decimals you aren't seeing it.

    On JVM all numbers, Byte and Int including, are signed. That's why you can see negative bytes. JVM's byte's range is -128 to 127 (8 bits, 2-complementary representation).

    If you are dividing (/) you're doing slightly different operation than bit shifting (>>>) which shows when the first bit is on and the number is negative.

    If you want to perform bit-wise operations, I would suggest working with methods like:

    • java.lang.Byte.toUnsignedInt(byte: Byte) to convert Byte to unsigned int (notice that it would be converted as unsigned int, the resulting Int would be pretty much signed for any numerical operation on JVM! Make sure that this conversion won't break anything by printing the values before and after conversion!)
    • f"0x${value}%02X" or java.lang.Integer.toHexString(value) - to print bytes
    • avoiding numerical operators like +, -, / and * and stick to binary operators (&, |, >>, >>>, and so on)

    and not trust the numbers in their default, signed decimal print, because it might be confusing:

    // REPL examples
    
    // 1000 0000 0000 0000 0000 0000 0000 0000 (-2147483648)
    @ java.lang.Integer.toHexString(Int.MinValue)
    res1: String = "80000000"
    
    // 0111 1111 1111 1111 1111 1111 1111 1111 (2147483647)
    @ java.lang.Integer.toHexString(Int.MaxValue)
    res2: String = "7fffffff"
    
    // 1111 1111 1111 1111 1111 1111 1111 1111 (-1)
    @ java.lang.Integer.toHexString(Int.MinValue | Int.MaxValue)
    res3: String = "ffffffff"
    
    // 0000 0000 0000 0000 0000 0000 0000 0000 (0)
    @ java.lang.Integer.toHexString(Int.MinValue & Int.MaxValue)
    res4: String = "0"
    
    // 1000 0000 (-128)
    @ f"0x${Byte.MinValue}%02X"
    res5: String = "0x80"
    
    // 0111 1111 (127)
    @ f"0x${Byte.MaxValue}%02X"
    res6: String = "0x7F"
    
    // 1111 1111 (-1)
    @ f"0x${Byte.MinValue | Byte.MaxValue}%02X"
    res7: String = "0xFFFFFFFF"
    
    // 0000 0000 (0)
    @ f"0x${Byte.MinValue & Byte.MaxValue}%02X"
    res8: String = "0x00"
    

    I would say that low-level operations are kinda-specific so write whatever way is most readable - I wouldn't bother with making such code idiomatic, as long as I could make it easy to maintain. Just make sure to justify properly why you are working with an error-prone low-lever stuff rather than using existing high-level components. And don't make assumptions about the speed without benchmarks.

    Converting Int into 4 bytes could also be done like:

    val int: Int = ...
    
    val mask = Integer.parseInt("1111", 2)
    
    // not all of these are necessary but I wrote it like that for consistency
    val b1 = (int >>> 12) & mask
    val b2 = (int >>> 8)  & mask
    val b3 = (int >>> 4)  & mask
    val b4 = (int >>> 0)  & mask