Search code examples
javamultithreadingdeadlock

Is this a deadlock when using ReentrantReadWriteLock?


I have an application with 1 writer thread and 8 reader threads accessing a shared resource, which is behind a ReentrantReadWriteLock. It froze for about an hour, producing no log output and not responding to requests. This is on Java 8.

Before killing it someone took thread dumps, which look like this:

Writer thread:

"writer-0" #83 prio=5 os_prio=0 tid=0x00007f899c166800 nid=0x2b1f waiting on condition [0x00007f898d3ba000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000002b8dd4ea8> (a java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
    at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)

Reader:

"reader-1" #249 daemon prio=5 os_prio=0 tid=0x00007f895000c000 nid=0x33d6 waiting on condition [0x00007f898edcf000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000002b8dd4ea8> (a java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
    at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)

This looks like a deadlock, however there are a couple of things that make me doubt that:

  • I can't find another thread that could possibly be holding the same lock
  • Taking a thread dump 4 seconds later yealds the same result, but all threads now report parking to wait for <0x00000002a7daa878>, which is different than 0x00000002b8dd4ea8 in the first dump.

Is this a deadlock? I see that there is some change in the threads' state, but it could only be internal to the lock implementation. What else could be causing this behaviour?


Solution

  • It turned out it was a deadlock. The thread holding the lock was not reported as holding any locks in the thread dump, which made it difficult to diagnose.

    The only way to understand that was to inspect a heap dump of the application. For those interested in how, here's the process step-by-step:

    1. A heap dump was taken at roughly the same time as the thread dumps.
    2. I opened it using Java VisualVM, which comes with JDK.
    3. In the "Classes" view I filtered by the class name of the class that contains the lock as a field.
    4. I double-clicked on the class to be taken to the "Instances" view
    5. Thankfully, there were only a few instances of that class, so I was able to find the one that was causing problems.
    6. I inspected the ReentrantReadWriteLock object kept in a field in the class. In particular the sync field of that lock keeps its state - in this case it was ReentrantReadWriteLock$FairSync.
    7. The state property of it was 65536. This represents both the number of shared and exclusive holds of the lock. The shared holds count is stored in the first 16 bits of the state, and is retrieved as state >>> 16. The exclusive holds count is in the last 16 bits, and is retrieved as state & ((1 << 16) - 1). From this we can see that there's 1 shared hold and 0 exclusive holds on the lock.
    8. You can see the threads waiting for the lock in the head field. It is a queue, with thread containing the waiting thread, and next containing the next node in the queue. Going through it I found the writer-0 and 7 of the 8 reader-n threads, confirming what we know from the thread dump.
    9. The firstReader field of the sync object contains the thread that have acquired the read log - from the comment in the code firstReader is the first thread to have acquired the read lock. firstReaderHoldCount is firstReader's hold count.More precisely, firstReader is the unique thread that last changed the shared count from 0 to 1, and has not released theread lock since then; null if there is no such thread.

    In this case the thread holding the lock was one of the reader threads. It was blocked on something entirely different, which would have required one of the other reader threads to progress. Ultimately it was caused by a bug, in which a reader thread would not properly release the lock, and keep it forever. That I found by analyzing the code, and adding tracking and logging when the lock was acquired and released.