Search code examples
javaperformancedata-structuresconcurrencyhashmap

Java ConcurrentHashMap is better than HashMap performance wise?


I was just reading the book Clean Code and came across this statement:

When Java was young Doug Lea wrote the seminal book[8] Concurrent Programming in Java. Along with the book he developed several thread-safe collection, which later became part of the JDK in the java.util.concurrent package. The collections in that package are safe for multithreaded situations and they perform well. In fact, the ConcurrentHashMap implementation performs better than HashMap in nearly all situations. It also allows for simultaneous concurrent reads and writes, and it has methods supporting common composite operations that are otherwise not thread safe. If Java 5 is the deployment environment, start with ConcurrentHashMap

Note that in the above quote I used "[n]", where n is some number, to indicate the places where the author provided references, and as you can see he did not provide any reference for the bold part.

Not that I don't believe this statement, but I would love to know the supporting evidences of this statement. So, does anyone know any resources that shows the performance statistics for both ConcurrentHashMap and HashMap? Or can anyone explain to me why ConcurrentHashMap is faster than HashMap?

I probably will look into ConcurrentHashMap's implementation at work when I'm taking a break, but for now I would like to hear the answers from fellow SOers.


Solution

  • Doug Lea is extremely good at these things, so I won't be surprised if at one time his ConcurrentHashMap performs better than Joshua Bloch's HashMap. However as of Java 7, the first @author of HashMap has become Doug Lea too. Obviously now there's no reason HashMap would be any slower than its concurrent cousin.

    Out of curiosity, I did some benchmark anyway. I run it under Java 7. The more entries there are, the closer the performance is. Eventually ConcurrentHashMap is within 3% of HashMap, which is quite remarkable. The bottleneck is really memory access, as the saying goes, "memory is the new disk (and disk is the new tape)". If the entries are in the cache, both will be fast; if the entries don't fit in cache, both will be slow. In real applications, a map doesn't have to be big to compete with others for residing in cache. If a map is used often, it's cached; if not, it's not cached, and that is the real determining factor, not the implementations (given both are implemented by the same expert)

    public static void main(String[] args)
    {
        for(int i = 0; i<100; i++)
        {
            System.out.println();
    
            int entries = i*100*1000;
            long t0=test( entries, new FakeMap() );
            long t1=test( entries, new HashMap() );
            long t2=test( entries, new ConcurrentHashMap() );
    
            long diff = (t2-t1)*100/(t1-t0);
            System.out.printf("entries=%,d time diff= %d%% %n", entries, diff);
        }
    }
    
    
    static long test(int ENTRIES, Map map)
    {
        long SEED = 0;
        Random random = new Random(SEED);
    
        int RW_RATIO = 10;
    
        long t0 = System.nanoTime();
    
        for(int i=0; i<ENTRIES; i++)
            map.put( random.nextInt(), random.nextInt() );
    
        for(int i=0; i<RW_RATIO; i++)
        {
            random.setSeed(SEED);
            for(int j=0; j<ENTRIES; j++)
            {
                map.get( random.nextInt() );
                random.nextInt();
            }
        }
        long t = System.nanoTime()-t0;
        System.out.printf("%,d ns %s %n", t, map.getClass());
        return t;
    }
    
    
    static class FakeMap implements Map
    {
        public Object get(Object key)
        {
            return null;  
        }
        public Object put(Object key, Object value)
        {
            return null;  
        }
        // etc. etc.
    }