Search code examples
javatype-conversionconditional-operatordowncast

Why in Java conditional expression the narrowing conversion of return type only occurs for final int and not for final short


When doing some online course, it mentioned that if we have a conditional expression in Java that returns on one side byte or short and on the other side final int, then if the int value fits into the short or byte it will be narrowed to match that type. This works only for the final int and not for the final short though.

I wonder what is the logic behind that, why the compiler allows for one narrowing conversion while preventing the other? It seems to be logical and correct, so why this limitation?

The following code works and compiles fine

public class Testing {
    public static void main(String[] args) {
        final int x = 99;
        byte y = 33;
        byte r = true ? x : y;
        System.out.println(r);
    }
}

But if I change it to short, then I get the compilation error

public class Testing {
    public static void main(String[] args) {
        final short x = 99;
        byte y = 33;
        byte r = true ? x : y;
        // java: incompatible types: possible lossy conversion from short to byte
        System.out.println(r);
    }
}

Solution

  • EDIT: This answer had 4 upvotes and was accepted but was, essentially, wrong, or at least, irrelevant. Answer thoroughly revised with the actual reason for the behaviour you have observed.


    The behaviour you observe matches what the spec dictates the compiler should be doing. It's a bit weird perhaps in the way you've stated it, but the reasons for why it works this way are (mostly) sensible. We'll need to hit a few aspects of the JLS to fully break this case down.

    Constant expressions

    Relevant section: JLS §15.28 – Constant Expressions

    true ? x : y in your examples are not considered a constant expression. Actual literals (such as 5) are obviously constant. Simple names of local variables can be, if you adhere to all 3 of these:

    • The variable is explicitly declared final (NOTE: Your byte y is not!)
    • It is initialized as it is declared. (i.e. not final byte b; b = 10;, but final byte b = 10;. The code appears to be identical but it is not, the spec is clear.
    • The expression used to initialize the variable is, itself, a constant expression.

    Because y isn't a constant expression, true ? x : y isn't either. If we make it one by adding final to byte b = 33;, then both snippets work and that's because a different section of the JLS kicks in: You are assigning a value that the compiler can determine as per JLS rules as 'actually that is constant and I know exactly what value that is' to a byte, in which case all concerns about the type of the expression are irrelevant; the question is simply: Is that contant value (here, 99) within the range of what bytes accept? If yes, it works without complaint and no need for a cast. If not, error.

    Language design note: One could consider a better JLS to upgrade §15.28 such that effectively final local variables also fall under the list of 'considered constant expressions', just like effectively final is good enough for use of local vars inside lambdas and anonymous inner class literals. However, the notion of effectively final has downsides; it's a bit 'magical' that adding someVar = 10; to your code causes an error in a totally different line of code (where you use someVar in a lambda), and the first thing you'd look at is still not the line you added (namely, where you declared the variable). In other words, effectively final is convenient but has downsides, and it's plausible that the lang designer did consider adding it and didn't. More likely, it was never considered. Point is, adding "effectively final now also counts for constant expressions" wouldn't be a clear and obvious upgrade to the JLS.

    'conditional expression' auto-adjustment

    Given that it's not a constant expression we run into a different and key clue from the JLS:

    JLS §15.25.2 – Numeric conditional expressions

    This section has a slightly bizarre rule in it: Given a conditional expression (That's: bool ? a : b) where:

    • a is int and b is byte or short or char (or vice versa)
    • and a (the int) is a constant expression
    • and the value of that constant fits in a byte/short/char

    Then the int is silently casted to a byte or short or char, and the type of the entire conditional expression is byte/short/char instead.

    Specifically, this rule applies only to ints:

    If one of the operands is of type T where T is byte, short, or char, and the other operand is a constant expression (§15.29) of type int whose value is representable in type T, then the type of the conditional expression is T.

    It doesn't even apply to longs! That's the 'oddness' you ran into; it explains why true ? x : y does work if x is final int x = 99;, because now x is [A] a constant, [B] of type int, and [C] fits in the -128 - +127 range of bytes, therefore, it is treated as if it was a byte, and thus the entire constant expression is of type byte, and thus, no cast needed to use it as initializing expression for a byte variable declaration.

    Language design note: This choice seems bizarre and indeed it would probably make for a slightly better JLS if this rule was applied more generally, i.e. any constant expression of any primitive type will automatically narrow itself to the smaller type in the conditional expression if it is both constant and its value fits. However, doing that now is technically backwards incompatible. After all, given:

    void foo(short a) { System.out.println("1"); }
    void foo(byte a) { System.out.println("2"); }
    
    ....
    final short constantShort = 10;
    byte nonConstantByte = 5;
    foo(true ? someShort : nonConstantByte);
    

    Compiles today and prints '1'. Making this language change means the same code still compiles and would print '2' instead. That kind of language update is expensive; you now have to tell users of java that upgrading might, in exotic circumstances, silently break your code. I sure hope you have a thorough test suite!

    Thus, while in a cleanroom situation where no java code exists at all, I'd vote to update the JLS, given where we are, which is that java 1.0 has been released (decades ago, but that isn't relevant; even if it was released yesterday; you as a language designer have to stand behind your own claim that your language is now deemed 'stable'!) - the cost of changing it now exceeds the value of having a slightly saner spec.

    There's also an argument to be made that the current spec, despite the fact that it results in this weird behaviour, is 'better' than the hypothetical alternative where all constants auto-narrow.

    ints are magic. They are the type of numeric literals. The expression 5 in java is always an int and the rest of the lang spec has rules in place to indicate that byte b = 5; nevertheless compiles without needing a cast despite the fact that the initializing expression is in basis an int and not a byte; that's how its set up and this makes sense; if I have:

    void foo(int a) { System.out.println("1"); }
    void foo(byte a) { System.out.println("2"); }
    
    foo(127);
    foo(128);
    

    It'd be rather surprising is that printed 2 1, but that's what would happen if java treats all integer literals as being the smallest primitive type that fits.

    Given that they are, one would expect e.g.:

    byte a = 5;
    byte c = true ? 10 : a;
    

    to just work and not complain about 'oh dear, you have to cast that literal '10' to a byte first, otherwise there is possibly lossy conversion' - a ridiculous claim; the compiler obviously knows that no lossy conversion is possible here; it knows that '10' converts to a byte without loss.

    One way out is to make the spec more complicated by not piggybacking this specific feature idea off of JLS §15.28 (Constant Expressions), instead defining this auto-narrowing behaviour solely for int literals and nothing else. And that'd be shortsighted. It would mean a simple attempt to tell your IDE to run a refactor script to turn that 10 into a constant (public static final int DEFAULT_VAL = 10;) now means a compiler error shows up, or the refactor script needs to be quite smart and add a cast. So, you get a language that is worse in this specific way, and with a more complicated lang spec, in order to avoid the situation you asked about which is arguably less likely than the 'extract constant' action described here.

    Addendum - 'superior primitives'

    I assume the spirit of this question is merely curiosity about the language spec. But just in case it's about a more pragmatic 'this really is bothering me when writing my code, how should I go about writing code like this', the answer is to stop using byte/short/char so much.

    They are the inferior primitives. You should rarely use them. boolean is inferior too, but the situations where inferior primitives should be used are quite common when booleans are involved, so, they get a pass on the whole 'you mostly should not use those at all' rule of thumb.

    At the JVM level, there are only 4 primitive types for the vast, vast majority of operations: double, float, long, and int. The rest are 'inferior'. For example, there's bytecode instructions for all 4 of those to add 2 together: DADD, FADD, LADD, and IADD which all pop 2 (respectively: doubles, floats, longs, ints) off the stack, add them together, and pops one value of the same type back onto the stack. There is no BADD, CADD, SADD or ZADD to add 2 bytes/chars/shorts/booleans together. This explains that this code:

    byte a = 1; // no `final` so not a 'constant variable'
    byte b = 2;
    byte c = a + b;
    

    In fact doesn't even compile. Because bizarrely, the type of the expression a + b is int... even though both a and b are byte! It makes sense once you realise that at the bytecode level java is forced to upgrade those bytes to int, then do IADD, then convert the resulting int back to a byte because there is no BADD - the JVM has no instruction whatsoever to add 2 bytes together. The only way is the convoluted 'convert both args to an int, add those, convert back' and this leaks into the JLS in this way.

    There is no point to the inferiors for the vast, vast majority of code. Having final byte x = 5; is not more efficient, either in CPU time or in memory taken - the JVM will usually use 64 bits for a field of type byte which seems wasteful but it is what it is, and almost all languages do this; CPUs really do not like working with data that isn't word aligned; just about every processor out there today has 64-bit sized words.

    The only time you should ever use byte, short, boolean, or char are in the following scenarios:

    • It is in your API signatures (so, return type or param type) and types also serve as documentation; the return type / parameter represents one of these types.
    • You are declaring a byte[]. While a byte field is less efficient than an int (or even a long!) in pretty much every imaginable way (less fast, and no smaller), a byte[] is 'compressed'; new byte[1000] takes about 1000 bytes of memory whereas new long[1000] takes about 8000.
    • You need to invoke a specific variant of an overloaded method that takes one of the inferiors.

    Bad reasons:

    • You really really need the overflow behaviour of one of the smaller types. Don't use the smaller type; instead, make it explicit that your algorithm requires the proper overflow, for example by using & 0xFF (which 'byte-overflows' any short/int/long so that the value you end up with is in the 0-255 range. This also lets you have 'unsigned bytes', effectively.

    • Performance. They are slower, not faster, because modern CPUs can't operate on anything smaller than 64 bits anyway.

    • mem size. In theory you can 'optimize' here (java.lang.String does this; it added some bookkeeping fields because they were 'free', given that java word-aligns objects), but getting an actual benefit requires analysing where the word-alignment boundary ends up taking you (e.g. a class def with 2 ref fields + 1 byte field ends up occupying as much RAM as 2 refs and 1 int for all JVMs I've ever heard of). The odds that you need to micromanage your object sizes to this extent is small. Are you writing java.lang.String? No? Then you probably should not worry about this.

    Hence, you should presumably just write int c = 5; here even though c semantically represents a byte value.