Search code examples
javacachinglockingdistributedgridgain

How can I manage unreleased locks in GridGain's cache?


I have distributed application that uses GridGain for caching and distributed locking. When my app starts it joins grid of serveral nodes. I faced the problem when one of my nodes eventually stops (e.g. my app's redeploy) and after joining to the grid I have this in my log:

    [13:57:32,140][WARNING][main][GridDhtPreloader] <cacheLocks> Failed to wait for initial partition map exchange. Possible reasons are:
      ^-- Transactions in deadlock.
      ^-- Long running transactions (ignore if this is the case).
      ^-- Unreleased explicit locks.
    [13:57:33,085][WARNING][grid-timeout-worker-#33%null%][GridDhtPartitionsExchangeFuture] <cacheLocks> Retrying preload partition exchange due to timeout [done=false, dummy=false, exchId=GridDhtPartitionExchangeId [topVer=56, nodeId=ee95b126, evt=NODE_JOINED], rcvdIds=[03e6666c], rmtIds=[ba1d527c, 03e6666c, 76bf5103], remaining=[ba1d527c, 76bf5103], init=true, initFut=true, ready=true, replied=false, added=true, oldest=76bf5103, oldestOrder=46, evtLatch=0, locNodeOrder=56, locNodeId=ee95b126-aaf9-4d46-9273-983e175d513a]

It was not deadlock nor long running transaction. I suppose that it was unreleased lock that was left unreleased because of immediate JVM stop and right before this my app did cache.lock(key, 0L) and did not cache.unlock(key).

All I had to do is to restart all the grid.

And the question is how to avoid unreleased locks and how to manage them? How can I correctly handle such situations?


Solution

  • You do not need to make any additional effort to release previously acquired locks on remote nodes upon node failure, GridGain will release those locks automatically. Can you update to the latest version of GridGain (6.5.6) and retry your tests? If it does not help, please create a reproducible example and attach to the question.