Search code examples
javabinarybytebitwise-operatorsbit-shift

Why does b >>> 1 always equals -1 when b is a byte and has value -1 in Java?


I know that >>> is unsigned shift-right, and since byte in Java is signed 8-bit then -1 will be represented as 1111 1111. As a result, for the code below:

byte b = (byte) -1;
b >>>= 1;
System.out.println(b);

Shouldn't it print 127 (0111 1111 in binary)? But the result I get is always -1, no matter how many times I perform >>> to b (e.g. in a loop).

Can anyone explain why and in what way I can get the result I expect (i.e. to perform unsigned right shift in which -1 shifted right once equals 127 for a byte in Java)? Is my understanding wrong? Thanks a lot!


Solution

  • You're running into a chain of really weird java design choices that make this code 'lie' (it appears to do one thing but it actually won't do that thing). Let's break it down:

    Lesser known java aspect 1: byte, short, char, and boolean are inferior

    In java, byte, short, char, boolean are the inferior primitives, with int, long, double and float being the superior primitives. There are no opcodes for the inferiors beyond array store and array load. There is no opcode in java that adds 2 bytes together. It just does not exist.. It is literally impossible to do this, given that javac has no idea what to generate. So, instead:

    byte b = 5;
    byte c = 10;
    byte d = b + c;
    

    Is actually taken to mean:

    byte b = 5;
    byte c = 10;
    byte d = (int) b + (int) c;
    

    And, thus, this is a compiler error - the result of int + int is int, and trying to assign an int to a byte is not allowed without explicit casting.

    You can trivially have a look at any list of java bytecodes. You'll find FADD, DADD, IADD, and LADD (respectively, 'float add', 'double add', 'int add' and 'long add'). But you won't find byte/short/char variants; there is no BADD, CADD, or SADD.

    Lesser known java aspect 2: The compound assignment operators auto-cast.

    this: a += b; is not actually identical to a = a + b; at all! I know that's what you were taught, but this is incorrect. It is, in fact, equal to a = (byte) (a + b); where that byte is the type of a itself:

    double d = 5.5;
    int x = 5;
    x = x + d; // obviously, compiler error. Explicit cast needed.
    

    versus:

    double d = 5.5;
    int x = 5;
    x += d;
    

    Totally legal. And results in x being 10. The full flow is:

    • x is 5. This is converted to double because java can only do math between 2 things of the same type.
    • d is 5.5, already a double, nice.
    • DADD bytecode is executed; the result of this is 10.5.
    • This is cast to int, resulting in 10, which is assigned to x.

    Run javap if you want to see this.

    Combining the two

    Thus, let's break down your code:

    byte b = (byte) -1;
    

    This makes the int -1 (32 '1' bits), then casts it to byte, resulting in b having the value 1111 1111. So far, so good, nothing surprising here.

    b >>>= 1;
    

    Hoo boy. This looks simple but is very complicated. It's syntax sugar for:

    b = (byte) (((int) b) >>> 1);
    

    To break that down:

    • (int) b results in 32 1-bits. b is -1 in byte, casting that to int produces -1 the int, which is 32 1-bits.
    • >>> 1 then makes that a zero, followed by 31 1 bits.
    • this is then cast back to a byte, which.. is -1 - 8 1-bits.

    So how do I do this?

    Don't use byte, basically. Don't ever use the inferior primitives except under the following circumstances:

    • You use them in signatures (i.e. the return type or param types of methods) as API doc, essentially.
    • You make an array. byte[] is useful and really does only take 1 byte per item.

    For all other things (such as single fields, and local vars aimed at math, such as what you are doing), don't. They don't act like you think they act, as you have found out here.

    Just go with ints. If you want an 'unsigned byte', just make an int variable and use that:

    byte b = (byte) -1;
    int x = b & 0xFF; // x is now 0000...0000 1111 1111.
    System.out.println(x); // prints 255
    x >>= 1;
    System.out.println(x); // prints 127
    

    byte as a single value (not byte[] - just byte) is not more efficient in any way than int or likely even long, on 64-bit architecture. Java does not assign less memory (because all things have to align on word boundaries), the operation is not faster (CPUs work in 64-bit chunks already).