java performance garbage-collection locking jfr

Massive locks in JVM on arrays allocations on java.lang.ref.Reference$Lock

We profiling our application with Java Flight Recorder and found massive locks on java.lang.ref.Reference$Lock object.

I investigate some places in stacktraces and findout that in all cases - there is array allocation

example of code ( position 3 on image):

    public static char[] copyOfRange(char[] original, int from, int to) {
        int newLength = to - from;
        if (newLength < 0)
            throw new IllegalArgumentException(from + " > " + to);

        // stacktrace points on next line 
        char[] copy = new char[newLength];

        System.arraycopy(original, from, copy, 0,
                         Math.min(original.length - from, newLength));
        return copy;
    }

I'm suspect that such locking has something with GC, but can't found any relevant information. Where I can read more on this topic?

End goal of activity: understand what is pappenning in such case, witch factors afect this and how we can decrease lock time of such operations.

Some details from comments:

Java 8
Heap 512Mb
GC - G1
With experiments I find out, that locktime is decreasing with heap size increase.

Solution

A big shortcoming of Java Flight Recorder is that it shows only Java stacks, completely ignoring the native and the VM part.

async-profiler is much more accurate in this sense. If you run it in lock profiling mode with native stacks on, it will show you exact place in the JVM where these locks are acquired. Example command:

./profiler.sh -d 60 -e lock --cstack fp -f profile.html -o flamegraph=total PID

-d 60 run profiling for 60 seconds
-e lock profile lock contention
--cstack fp record C (native) stacks
-f profile.html output file name (HTML format in async-profiler 2.0, or SVG in 1.x)
-o flamegraph=total build the flame graph using the total lock wait time as a counter
PID Java process ID

In this example, the Flame Graph highlighted lock contention on a Reference$Lock instance. Java part of the stack traces is displayed in green. This matches the stack traces you've seen in JFR. Like in your case, the top Java frame is Arrays.copyOfRange (the graph shows other stacks as well, but let's focus on the first one).

The yellow part is the native C++ code. Let me explain what happens there.

Arrays.copyOfRange calls the VM runtime function OptoRuntime::new_array_nozero_C. The actual array allocation happens in C++ code of the JVM.
The JVM fails to allocate an array from existing Thread Local Allocation Buffer (TLAB) and then falls back to a slow path allocation of a new TLAB.
The slow path allocation doesn't succeed either, because there is no enough free memory in Java Heap. So, the JVM synchronously invokes Garbage Collector.
In the GC prologue, the JVM tries to acquire the lock that protects the list of pending references. This is to ensure that ReferenceHandler thread leaves the critical section before the GC starts. While holding this lock, the JVM can safely append newly discovered weak references to the pending list.
However, the lock is already acquired by another thread that has concurrently attempted to invoke Garbage Collector the same way. The current thread is suspended until GC finishes.

To sum it up, multiple Java threads concurrently attempted to allocate an object from the Heap, but the Heap was full. So, the Garbage Collection started, and the allocating threads were blocked on Reference$Lock - the reference pending list lock.

The contention on Reference$Lock is not a problem per se. The allocating threads cannot proceed anyway, until GC reclaims enough memory. The actual issue is that concurrent garbage collection does not keep up with the allocation rate.

To mitigate the problem, try one or more of the following approaches:

increase the heap size;
reduce the allocation rate;
increase the number of concurrent GC threads - ConcGCThreads;
lower InitiatingHeapOccupancyPercent to start concurrent GC cycle earlier;

Increasing the heap will be probably the most efficient.

BTW, async-profiler has other useful modes to diagnose GC related issues:

-e cpu shows what takes the most CPU time. Java and VM threads are displayed together on the same graph, so you can get an idea whether GC activity is too high comparing to the application work.
-e alloc shows the code which allocates most. It's especially useful when investigating how to reduce the allocation rate.