While moving from CMS to G1 for some of our applications, I noticed that one of them suffered from a startup time extended by factor 4. Application stop time due to GC cycles is not the cause. On comparing application behaviour, I disovered that this one carries a whopping 250 million of live objects after startup (in a heap of 12G). Further investigation showed that the application had normal speed during the first 5 million allocations, but performance degraded more and more as the pool of live objects grew larger.
Further experiments indicated that once a certain threshold of live objects is reached, allocation of new objects will indeed slow down when using G1. I found that doubling the number of live objects seems to put a factor of around 2.5 on the time needed for that allocation. With other GC engines, the factor is only 2. This would indeed explain the slowdown.
There are two issues which make me doubt that conclusion, though:
So: it would be great if someone could either tell me that my observations are correct, and perhaps point me to some explaining document, or to some recommendatons regarding this area. Or, alternatively, that someone tells me what I'm doing wrong. :)
Here is a short test case (run multiple times, take mean values, deduct the displayed garbage collection times):
import java.util.HashMap;
/**
* Allocator demonstrates the dependency between number of live objects
* and allocation speed, using various GC algorithms.
* Call it using, e.g.:
* java Allocator -Xmx12g -Xms12g -XX:+PrintGCApplicationStoppedTime -XX:+UseG1GC
* java Allocator -Xmx12g -Xms12g -XX:+PrintGCApplicationStoppedTime
* Deduct stopped times from execution time.
*/
public class Allocator {
public static void main(String[] args) {
timer(2000000, true);
for (int i = 1000000; i <= 32000000; i*=2) {
timer(i, false);
}
for (int i = 32000000; i >= 1000000; i/=2) {
timer(i, false);
}
}
private static void timer(int num, boolean warmup) {
long before = System.currentTimeMillis();
Allocator a = new Allocator();
int size = a.allocate(num);
long after = System.currentTimeMillis();
if (!warmup) {
System.out.println("Time needed for " + num + " allocations: "
+ (after - before) + " millis. Map size = " + size);
}
}
private int allocate(int numElements) {
HashMap<Integer, String> map = new HashMap<>(2*numElements);
for (int i = 0; i < numElements; i++) {
map.put(i, Integer.toString(i));
}
return map.size();
}
}
As discussed in the comments above:
Your test-case does pre-allocate very large reference arrays that are long-lived and essentially occupy a region of their own (they probably end up in the old gen or humongous regions) and then populating them with millions of additional objects that are likely to live in a different region.
This creates lots of cross-region references, which G1 can deal with in in moderate amounts but not in the millions per region.
Highly interconnected regions are also considered as expensive to collect by G1's heuristics and thus less likely to get collected even if they entirely consist of garbage.
Allocating objects together to reduces cross-region references.
Not articially extending their lifetime too much (e.g. by putting them into some sort of cache) also allows them to die during a young generation GC, which are much easier to collect than old regions which by nature accumulate objects referenced from different regions.
So all in all your test-case was rather hostile to the region-based nature of G1.