Search code examples
javaconcurrenthashmap

facts about Concurrenthashmap


I have read a couple of statements on ConcurrentHashmap from varied sources and wanted to verify if they are indeed so.

  1. After the iterator for a ConcurrentHashmap is created, ONLY the removal and update operations by threads are guaranteed to be reflected. Does the iterator refresh its snapshot after an edit/remove? Why would the iterator treat an update/remove any differently than an ADD.

  2. ConcurrentHashmap shards its data into segments to reduce writer lock contention. The concurrencyLevel parameter directly specifies the number of shards that are created by the class internally. If we simply use the parameterless constructor, and accept the default configuration, the map will be instantiating the objects needed for 16 shards before you even add your first value… What do we mean by a shard here? Is it a bucket of data within the map - or a copy of the entire map. I understood this to be similar to a page in the database, which can be independently locked fur update. Why would the concurrencyLevel impact memory then?


Solution

  • Does the iterator refresh its snapshot after an edit/remove?Why would the iterator treat an update/remove any differently than an ADD.

    The CHM's Iterator is explained in the API such that

    Similarly, Iterators and Enumerations return elements reflecting the state of the hash table at some point at or since the creation of the iterator/enumeration.

    Meaning the Iterator returned may or may not reflect changes that occur in the Map while iterating. Imagine if you create the iterator and traverse an entire segment and go to the next segment. After you go to the next segment the first segment you finished traversing had an add or remove done. Well you won't see that and that's ok, it is not violating the API.

    As to your second question. Add is implied, there is no difference wrt to visibility between add and remove.

    Why would the concurrencylevel impact memory then?

    Memory has been an issue with ConcurrentHashMap since it was released. Each level of concurrency will by default create a single Segment. This segment has a HashEntry table and is also a Reentrant lock (so all of the necessities for that as well).

    Java 8 is release CHMv8 which actually solves this issue.

    You can read more on the memory here, specifically:

    Whilst doing a memory profile, JVisualVM showed that the top culprit was the ConcurrentHashMap.Segment class. The default number of segments per ConcurrentHashMap is 16. The HashEntry tables within the segments would probably be tiny, but each Segment is a ReentrantLock. Each ReentrantLock contains a Sync, in this case a NonFairSync, which is a subclass of Sync and then AbstractQueuedSynchronizer. Each of these contains a queue of nodes that maintain state of what is happening with your threads. It is used when fairness is determined. This queue and the nodes use a lot of memory.