Search code examples
scalaapache-sparkredisredisson

Failed to return from Redis call from Scala/spark, Shows some Deadlock in thread dump


I am new to scala and spark world, some where in scala code I am seeing invocation to Redis call via Redisson 3.9.1 to get keys data which is very few number of records and this leads me some deadlock as seen below trace. Could someone please acknowledge what could be issue that I can take a hit on.

Full thread dump OpenJDK 64-Bit Server VM (25.282-b08 mixed mode):

"Attach Listener" #124 daemon prio=9 os_prio=0 tid=0x00007fbde8002800 nid=0x494e1 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Keep-Alive-Timer" #123 daemon prio=8 os_prio=0 tid=0x00007fbd68021000 nid=0x493d1 waiting on condition [0x00007fbb2fffe000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at sun.net.www.http.KeepAliveCache.run(KeepAliveCache.java:172)
        at java.lang.Thread.run(Thread.java:748)

"ForkJoinPool-1-worker-5" #117 daemon prio=5 os_prio=0 tid=0x00007fbc9c87b000 nid=0x475ba waiting on condition [0x00007fbb1dbfc000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00007fc68d00ed50> (a java.util.concurrent.CountDownLatch$Sync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
        at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
        at org.redisson.command.CommandAsyncService.get(CommandAsyncService.java:182)
        at org.redisson.RedissonKeys$2.iterator(RedissonKeys.java:127)
        at org.redisson.RedissonKeys$2.iterator(RedissonKeys.java:123)
        at org.redisson.BaseIterator.hasNext(BaseIterator.java:54)
        at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42)
        at scala.collection.Iterator$class.foreach(Iterator.scala:893)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
        at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
        at scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:247)
        at scala.collection.TraversableLike$class.filterNot(TraversableLike.scala:267)
        at scala.collection.AbstractTraversable.filterNot(Traversable.scala:104)
        at com.mycomosi.eaa.common.infrastructure.topology.store.TopologyStoreEntityService$$anonfun$getTopologyInstanceIdsExcludingVersion$1$$anonfun$apply$7.apply(TopologyStoreEntityService.scala:73)
        at com.mycomosi.eaa.common.infrastructure.topology.store.TopologyStoreEntityService$$anonfun$getTopologyInstanceIdsExcludingVersion$1$$anonfun$apply$7.apply(TopologyStoreEntityService.scala:70)
        at scala.collection.immutable.Set$Set1.foreach(Set.scala:94)
        at com.mycomosi.eaa.common.infrastructure.topology.store.TopologyStoreEntityService$$anonfun$getTopologyInstanceIdsExcludingVersion$1.apply(TopologyStoreEntityService.scala:70)
        at com.mycomosi.eaa.common.infrastructure.topology.store.TopologyStoreEntityService$$anonfun$getTopologyInstanceIdsExcludingVersion$1.apply(TopologyStoreEntityService.scala:69)
        at scala.collection.Iterator$class.foreach(Iterator.scala:893)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
        
    

Solution

  • Actual cause of this was slow response from keyByPattern query from redisson client. When there is huge volume of data, default keyByPattern query search with count of 10 and it build iterator for complete volume of keys. If instead of default provide some higher count value it can perform much better.

    I also have referenced this issue at github plesae refer

    https://github.com/redisson/redisson/issues/4635