Search code examples
javaconcurrencyconcurrenthashmap

java ConcurrentHashMap - How does RESIZE_STAMP_BITS/RESIZE_STAMP_SHIFT work in a resize operation?


I am trying my best to understand how ConcurrentHashMap works under the hood.

It seems that during resizes there is an entire encoding scheme happening within sizeCtl variable.

Some speculations are saying that the lower 16 bits signify the number of threads, other speculations specify that there is a point system counter implemented, ie. +1 when a thread is doing the resizing and -1 for when a thread is leaving the resizing.

https://stackoverflow.com/a/52668122/7134737

https://stackoverflow.com/a/53477058/7134737

Can someone explain in plain terms what the following variables do :

How do they interact with the sizeCtl variable ? It seems this variable is used for multiple operations, none of which are very well documented.

Sorry if this seems like a rant, but its frustrating to not understand the bit manipulations.


Solution

  • These values are only used internally, so they don't need to be well documented.

    Still, let's try to explain the different roles and values that sizeCtl has:

    • if table is null: initial table size when specified in the constructor, 0 for default table size (DEFAULT_CAPACITY which is 16) - this value is always greater or equal to 0
    • -1 if table is being initialized because some thread put the first value into the ConcurrentHashMap - either by calling the constructor with a Map of values or by calling one of the entry adding methods.
    • if table is not null: number of entries where the next resize will be started, calculated as n - n/4 with n being table.length - this value is always greater than 0
    • some special value while resizing the table - this value is always less than -1

    The special value while resizing is built from two parts:

    • resizeStamp which has a length of RESIZE_STAMP_BITS (16) and is placed in sizeCtl by shifting it left by RESIZE_STAMP_SHIFT (32 - RESIZE_STAMP_BITS which is incidentally also 16)
    • "resizerCount" which has a length of 32 - RESIZE_STAMP_BITS (16)

    You can picture this as

     31                                           16 15                                            0
    +--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--+
    |                  resizeStamp                  |                  resizerCount                 |
    +--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--+
    
    • MAX_RESIZERS is just the maximal value the the "resizerCount" part can hold.

    The resizerCount is used to control the acquisition of additionals threads to help out in resizing the ConcurrentHashMap. The thread that initiates the resizing operating sets its value to 2 (plus the value resizeStamp << RESIZE_STAMP_SHIFT). If additional threads try to add entries during a resize operation they check whether there are table entries to be moved and the value of the resizerCount is less than MAX_RESIZERS. If this is the case they join the resize operation by incrementing the resizerCount and the start moving map entries from the old table to the nextTable.

    Moving map entries from the old table to the nextTable is done in blocks (to prevent contention). After each block the threads participating in the resize operation check whether there are more blocks to move. If there are no more blocks, the thread decrements the resizerCount, checks if it is the last thread doing resizing (indicated by resizerCount now being 1) and if it is the last thread will finish the resize operation: change table to nextTable and set sizeCtl to the amount of entries that will trigger the next resize operation.


    Why is the resizeStamp needed?

    Because the threads must coordinate the resize work. A thread "X" that decides to participate in resizing reads the values of the table and nextTable fields and then tries to join the group of threads doing the resizing.

    It can happen that the thread "X" is suspended between reading the fields and joining the group of threads doing the resizing work and that the resizing of the table that it has read is already completed but a new resize is in progress. The value in resizeStamp encodes the size of the table array and lets the thread "X" detect that situation which means that it must re-read the values of the table and nextTable fields.