Search code examples
javamultithreadingserializationconcurrenthashmap

Proper way to serialize concurrenthashmap in multithread environment


I am writing a class in which a static ConcurrentHashMap is used (with operations like get(), put(), clear(), etc.) by multiple threads. In this class I also need to serialize the ConcurrentHashMap into a file, and deserialize it from the file. The problem is that the ConcurrentHashMap can be modified while it is being serialized and therefore it may not be thread-safe.

My questions are:

  1. ConcurrentHashMap is thread-safe, is it safe that it is modified while it is being serialized by multiple threads? (I guess the answer is no, but need confirmation)
  2. What is the best practice to serialize ConcurrentHashMap in multiple thread with the risk of a modification at the same time. Note that both safety and performance are critical for my application.

Solution

  • You start by looking into the Javadoc:

    A hash table supporting full concurrency of retrievals and high expected concurrency for updates.

    In that sense, the answer is: it depends.

    As shown above: it is possible to read such a map while it is being updated.

    So theoretically, you could "serialize" your map by simply reading all its entries; and storing those. But of course, you have no idea if updates took place while reading the map. Thus - not a good idea.

    Then let us think about serializing the whole map in one shot. It might again depend how exactly you do that (like: using plain old java object serialization - or maybe libraries such as jackson or gson in order to serialize the map into JSON) - but on the other hand, serialization will have to look at the internals of the map object. And you do not want that the map gets updated whilst this is going on.

    Conclusion: the only choice for you is to use a lock that any thread that is going to update or serialize the map has to hold.

    See here for an introduction to the various types of locks.

    And you know, you can't have it both ways. If the integrity of your data matters to you, then you have to block all adds/updates/removals requests whilst the map is serialized!