java memory garbage-collection jvm jvm-hotspot

Garbage Collection for several VMs

We have an application that consists of usually ~20 JVMs and we distribute batch jobs to them. The 20 JVMs run in the same Operating System. Before dispatching a batch job to one of them, it's hard to tell how long and how big the job is. It could take 1 minutes or several hours. Memory consumption is similarly varying.

So far this worked well, we have a total of 40GB Memory available, we had max heap size set to 2GB for each JVM (2GB is necessary sometimes). Since it was never the case that we had too many "big" batch jobs running at the same time, we never had memory issues. Until we moved to the Java 8 vm. It seems that the full GC is triggered less frequently. We have JVM being mostly idle rising in Memory usage. When I trigger a GC by calling jcmd, I can see the OldGen going down from like 1GB to 200MB.

I know this is not a good setup to have 20 JVMs with max 2GB Heap + Stack + Metaspace that would in max total be a lot more than the 40GB memory available. But it's a Situation we have to live with. And I'd be surprised if there is a way to set a max heap size for a Cluster of several JVMs. So I need to come up with other solutions.

I was looking for some VM option that tells the VM to do a full GC in regular intervals, this would very likely solve our problem. But I can't find a VM Option to do this.

Any suggestions on how we can set this up to avoid memory swapping?

EDIT: Here is a snippet from the gc log:

2016-04-14T01:02:49.413+0200: 37428.762: [Full GC (Ergonomics) [PSYoungGen: 28612K->0K(629248K)] [ParOldGen: 1268473K->243392K(1309184K)] 1297086K->243392K(1938432K), [Metaspace: 120332K->120320K(1181696K)], 0.3438924 secs] [Times: user=1.69 sys=0.02, real=0.35 secs] 
2016-04-14T01:02:52.442+0200: 37431.792: [GC (Allocation Failure) [PSYoungGen: 561664K->67304K(629248K)] 805056K->310696K(1938432K), 0.0315138 secs] [Times: user=0.26 sys=0.00, real=0.03 secs] 
2016-04-14T01:02:54.809+0200: 37434.159: [GC (Allocation Failure) [PSYoungGen: 628968K->38733K(623104K)] 872360K->309555K(1932288K), 0.0425780 secs] [Times: user=0.35 sys=0.00, real=0.04 secs] 
...
2016-04-14T10:09:03.558+0200: 70202.907: [GC (Allocation Failure) [PSYoungGen: 547152K->41386K(531968K)] 1545772K->1041036K(1841152K), 0.0255883 secs] [Times: user=0.18 sys=0.00, real=0.02 secs] 
2016-04-14T10:20:53.634+0200: 70912.984: [GC (Allocation Failure) [PSYoungGen: 531882K->40733K(542720K)] 1531532K->1042107K(1851904K), 0.0306816 secs] [Times: user=0.22 sys=0.02, real=0.03 secs] 
2016-04-14T10:23:10.830+0200: 71050.180: [GC (System.gc()) [PSYoungGen: 60415K->37236K(520192K)] 1061790K->1040674K(1829376K), 0.0228505 secs] [Times: user=0.17 sys=0.01, real=0.02 secs] 
2016-04-14T10:23:10.853+0200: 71050.203: [Full GC (System.gc()) [PSYoungGen: 37236K->0K(520192K)] [ParOldGen: 1003438K->170089K(1309184K)] 1040674K->170089K(1829376K), [Metaspace: 133559K->129636K(1196032K)], 1.4149811 secs] [Times: user=11.10 sys=0.02, real=1.42 secs]

If we had a full GC every hour, it would solve our Problem, I guess.

Solution

Instead of attempting to use time-triggered GCs you could try running with -XX:GCTimeRatio=14 -XX:MaxHeapFreeRatio=30 -XX:MixHeapFreeRatio=20. This will tell the collector to keep less headroom and do so by allowing it to collect more often/spending more CPU cycles on GCs.

On current JDK9 builds this could be further combined with -XX:-ShrinkHeapInSteps to let the allocated heap size trail the used heap even more closely. Again, potentially at the expense of performance.