Search code examples
javareferenceprimitivepool

Java Short constant pool versus short primitives


My solution requires many constant variables, so in further development i could simply create new primitives or reference existing data, instead of creating new one, that excludes possible mistakes made in future development process.

I've read that java pools constant variables, when new data is created it compares with the pool, if such object exists it returns reference to existing one instead of creating new one.

While pooling might sound best approach, in my case, i need many short variables, which allocates 2 bytes (for a primitive) each. But if I'd create Short i would loose 2 byte, because reference would take 4 bytes.

Would it make sense to still use primitives even considering that Short uses pooling. Plus, unboxing from Short to short also takes some resources (which are almost close to zero, but still). Note that short will have to converted to primitive 3 byte array time to time, so another + for primitive.

public static final short USER = 10;

instead of

public static final Short USER = 10;

Solution

  • The most important thing here is primitives are way way cheaper than object wrappers, in both time and memory complexities.

    The pooling feature only becomes relevant if you have to use these primitives in contexts where Object references have to be used (i.e. they have to be wrapped/"boxed" into their object wrappers, google auto-boxing). If you can use them as primitive numbers all the time then it is the most efficient way.

    Details:

    The Java language treats primitives types (boolean, byte, char, short, int, long, float, double) differently from all other types (which are reference types). The primitives can directly exist on the stack and can be directly manipulated by JVM instructions (there're sets of instructions for each of the primitives). Numerical constants are often directly embeded into JVM instructions, which means no additional memory read is needed to execute these instructions. This structure maps more or less directly to native code on all hardware.

    The reference types on the other hand cannot exist on the stack and must be allocated on the heap (this is a Java language design choice). This means each time you use them, instructions has to be added to find the instance, read meta data, invoke a method or get a field before any real operation on the data can be performed.

    For example, say you have the function

    int add(int a, int b) { return a + b; }
    

    The body of the function (apart from calling convention) will be simply iadd, which translates to 1 instruction on most CPUs.

    If you change the ints into Integerss, the byte code becomes:

       0: aload_1
       1: invokevirtual #16                 // Method java/lang/Integer.intValue:()I
       4: aload_2
       5: invokevirtual #16                 // Method java/lang/Integer.intValue:()I
       8: iadd
       9: invokestatic  #22                 // Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
    

    which translates to multiple internal function calls and thousands of native instructions.

    Random notes:

    IIRC, the Java language spec did not define how many bytes will be used to store short, boolean, etc. The JVM implementation may use more bytes so it aligns everything with the word length of the CPU. It is not unusual to see a boolean stored as a byte or a 32-bit int purely for efficiency purposes. It did say however, that the result of operations on types shorter than int all comes out as int.

    All of these means short doesn't really give you any memory savings unless you have very large short arrays (which has to be packed without gap in memory).

    And if you really, really want to use a pool of Shorts (the wrapper), the most efficient implementation is an array of size 65536. Here you exchange 4k/8k of memory for very efficient lookups.