Search code examples
javadata-structuresjvmjava-collections-api

JVM space complexity details: Singly Linked Lists vs Doubly Linked Lists


I have come across a curious situation where students (I'm a teaching assistant in this context) have to implement their own version of a Singly Linked List (SLL) and compare it empirically with the Java standard library implementation of a Doubly Linked List.

And that's where it gets weird: I've seen multiple students note that the DLL profiles at roughly 0.5% extra space utilization compared to an SLL containing an equal number of elements of the same type. All the while basic analysis of the data structures tells me that a SLL has 2 references per node (1 to the next element, and 1 to the contained value) whereas a DLL has 3 (an additional one reference to the previous element). In other words, that is 50% more space usage per node (disregarding the size of the contained value).

The contained values are mostly Integer value objects so I don't think the size of the contained values matters too much here.

What causes this 2 orders of magnitude difference? I'm not entirely sure that "JVM/collections libraries optimization" can cover the whole difference; otherwise it'd have to be one hell of a JVM/java std lib optimization.


Solution

  • The space used should be the same on the Oracle JVM/OpenJDK for a 64-bit JVM with 32-bit references (Compressed oops)

    For a Node with two references

    header: 12 bytes
    two references: 8 bytes
    alignment padding: 4 bytes
    

    total is 24 bytes per node as all objects are aligned by 8 byte offsets by default.

    For a Node with three references

    header: 12 bytes
    three references: 12 bytes
    alignment padding: 0 bytes
    

    the total is 24 bytes again.

    The real question is why did you see any difference at all. This is most likely due to inaccurate memory accounting.

    The JVM uses a TLAB (Thread Local Allocation Buffer) This allows the threads in a JVM to grab chunks of memory and allocate concurrently from those chunks. On the downside, you only see how much memory is used from the common Eden space i.e. you have no idea how much of each chunk is used.

    A simple way around this is to turn off the TLAB which gives you byte-by-byte memory account (at the cost of some performance)

    i.g. try -XX:-UseTLAB on the command line to disable the TLAB and you will see the size of every object allocated.