We profiling our application with Java Flight Recorder and found massive locks on java.lang.ref.Reference$Lock object.
I investigate some places in stacktraces and findout that in all cases - there is array allocation
example of code ( position 3 on image):
public static char[] copyOfRange(char[] original, int from, int to) {
int newLength = to - from;
if (newLength < 0)
throw new IllegalArgumentException(from + " > " + to);
// stacktrace points on next line
char[] copy = new char[newLength];
System.arraycopy(original, from, copy, 0,
Math.min(original.length - from, newLength));
return copy;
}
I'm suspect that such locking has something with GC, but can't found any relevant information. Where I can read more on this topic?
End goal of activity: understand what is pappenning in such case, witch factors afect this and how we can decrease lock time of such operations.
Some details from comments:
A big shortcoming of Java Flight Recorder is that it shows only Java stacks, completely ignoring the native and the VM part.
async-profiler is much more accurate in this sense. If you run it in lock
profiling mode with native stacks on, it will show you exact place in the JVM where these locks are acquired. Example command:
./profiler.sh -d 60 -e lock --cstack fp -f profile.html -o flamegraph=total PID
-d 60
run profiling for 60 seconds-e lock
profile lock contention--cstack fp
record C (native) stacks-f profile.html
output file name (HTML format in async-profiler 2.0, or SVG in 1.x)-o flamegraph=total
build the flame graph using the total lock wait time as a counterPID
Java process IDIn this example, the Flame Graph highlighted lock contention on a Reference$Lock
instance. Java part of the stack traces is displayed in green. This matches the stack traces you've seen in JFR. Like in your case, the top Java frame is Arrays.copyOfRange
(the graph shows other stacks as well, but let's focus on the first one).
The yellow part is the native C++ code. Let me explain what happens there.
Arrays.copyOfRange
calls the VM runtime function OptoRuntime::new_array_nozero_C
. The actual array allocation happens in C++ code of the JVM.
The JVM fails to allocate an array from existing Thread Local Allocation Buffer (TLAB) and then falls back to a slow path allocation of a new TLAB.
The slow path allocation doesn't succeed either, because there is no enough free memory in Java Heap. So, the JVM synchronously invokes Garbage Collector.
In the GC prologue, the JVM tries to acquire the lock that protects the list of pending references. This is to ensure that ReferenceHandler
thread leaves the critical section before the GC starts. While holding this lock, the JVM can safely append newly discovered weak references to the pending list.
However, the lock is already acquired by another thread that has concurrently attempted to invoke Garbage Collector the same way. The current thread is suspended until GC finishes.
To sum it up, multiple Java threads concurrently attempted to allocate an object from the Heap, but the Heap was full. So, the Garbage Collection started, and the allocating threads were blocked on Reference$Lock
- the reference pending list lock.
The contention on Reference$Lock
is not a problem per se. The allocating threads cannot proceed anyway, until GC reclaims enough memory. The actual issue is that concurrent garbage collection does not keep up with the allocation rate.
To mitigate the problem, try one or more of the following approaches:
ConcGCThreads
;InitiatingHeapOccupancyPercent
to start concurrent GC cycle earlier;Increasing the heap will be probably the most efficient.
BTW, async-profiler has other useful modes to diagnose GC related issues:
-e cpu
shows what takes the most CPU time. Java and VM threads are displayed together on the same graph, so you can get an idea whether GC activity is too high comparing to the application work.-e alloc
shows the code which allocates most. It's especially useful when investigating how to reduce the allocation rate.