javaperformancehashmapintersection

Java HashMap retainAll time complexity


I tested HashMap retainAll method to test which one is faster with JMH.

Two HashMap A, B

  • Size : A > B, A : 300~400, B : 100~200
  • Most of B elements is in A, so B - A is like 10~20, very small

Test : get intersecion of two HashMap

  1. A.keySet().retainAll(B.keySet())

  2. B.keySet().retainAll(A.keySet())

//JMH 
@State(Scope.Benchmark)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@Fork(value = 1, jvmArgs={"-Xms4G", "-Xmx4G"})
@Warmup(iterations = 5, time = 5)
@Measurement(iterations = 10, time = 5)

private Map<String, MyClass> bigOne, smallOne;

private final int testIteration = 1000;

@Benchmark
public void smallToBig(){
    for(int i=0;i<testIteration;i++){
        smallOne.keySet().retainAll(bigOne.keySet());
    }
}

@Benchmark
public void bigToSmall(){
    for(int i=0;i<testIteration;i++){
        bigOne.keySet().retainAll(smallOne.keySet());
    }
}

RetainAll method uses contains in AbstractMap, so HashMap contains is O(1). so iterating Map will takes most of performance, means smallOne iteration should be faster.

I expected the former to be faster, but the actual results were different

Result

I tried lots of time, but get same result. Could you tell me why the Big.retainAll(Small) is faster?

Thanks for helping.


Solution

  • contains has complexity O(1) in average, and O(n) in worst case. EDIT: it is O(logn) in worst case after java 8!

    For Big.retainAll(Small) you do "contains" on a small set and many removes on a big set. For Small.retainAll(Big) you do contains on a larger set and almost no removes u said. (and removes are for sure more complex than contains anyway).

    So, if your benchmarks are correct this could only mean that your contains complexity is closer to O(n) and this is why smallToBig takes more time. Try some different keys, maybe Integer for example to see if the same happens

    EDIT: i just run into some analysis of Java hashcode for String and how you can exploit it to run attacks. this slide came up (source is princeton algorithms course)

    enter image description here