javaperformancehashmapintersection

# Java HashMap retainAll time complexity

I tested HashMap retainAll method to test which one is faster with JMH.

Two HashMap A, B

• Size : A > B, A : 300~400, B : 100~200
• Most of B elements is in A, so B - A is like 10~20, very small

Test : get intersecion of two HashMap

1. A.keySet().retainAll(B.keySet())

2. B.keySet().retainAll(A.keySet())

``````//JMH
@State(Scope.Benchmark)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@Fork(value = 1, jvmArgs={"-Xms4G", "-Xmx4G"})
@Warmup(iterations = 5, time = 5)
@Measurement(iterations = 10, time = 5)

private Map<String, MyClass> bigOne, smallOne;

private final int testIteration = 1000;

@Benchmark
public void smallToBig(){
for(int i=0;i<testIteration;i++){
smallOne.keySet().retainAll(bigOne.keySet());
}
}

@Benchmark
public void bigToSmall(){
for(int i=0;i<testIteration;i++){
bigOne.keySet().retainAll(smallOne.keySet());
}
}
``````

RetainAll method uses contains in AbstractMap, so HashMap contains is O(1). so iterating Map will takes most of performance, means smallOne iteration should be faster.

I expected the former to be faster, but the actual results were different

I tried lots of time, but get same result. Could you tell me why the Big.retainAll(Small) is faster?

Thanks for helping.

Solution

• `contains` has complexity O(1) in average, and O(n) in worst case. EDIT: it is O(logn) in worst case after java 8!

For `Big.retainAll(Small)` you do "contains" on a small set and many removes on a big set. For `Small.retainAll(Big)` you do contains on a larger set and almost no removes u said. (and removes are for sure more complex than contains anyway).

So, if your benchmarks are correct this could only mean that your contains complexity is closer to O(n) and this is why smallToBig takes more time. Try some different keys, maybe Integer for example to see if the same happens

EDIT: i just run into some analysis of Java hashcode for String and how you can exploit it to run attacks. this slide came up (source is princeton algorithms course)