Search code examples
jbossjboss6.xinfinispan

Infinispan Cluster Synchronization Problems


we use a JBoss cluster (EAP 6.4.10) with 6 instances and heavily use the bundled Infinispan 5.2.11 for different in-memory-grid use cases. Most of them are distributed caches, however (replicated, actually). We also have distributed transactions and JMS bound to transactional borders in the mix. Backend is SQL Server 2016 Enterprise, and even here transactions could span multiple SQL server instances and databases (DTC).
Synchronization of Infinispan is done with UDP multicast using JGroups 3.2.13

From time to time, and especially under or after heavy load, we face the problem that JBoss worker threads pile up at specific Infinispan internal locks that seemingly never get released. Thus, the http connection pool starvates, open transactions on the databases don't get committed or rolled back; JMS messages get lost; we face a heap of locks on the database that affect all other systems connecting to the same databases (which are the other JBoss instances in the cluster).

Currently we watch the http thread pools and as soon as threads start to pile up for a certain amount of time, the instance is removed from the load balancer and shut down.
Sometimes this is unnecessary, i.e. the cluster heals itself and the culprit instance starts to behave normally again without manual intervention.
However, usually the only way is manually restarting the instance and keeping fingers crossed that the correct infinispan cluster forms again.

Stack traces from hanging instances always show several interesting features:

  1. One group of threads (http workers) pile up on a condition deep inside JGroup's flow control implementation; the application's last call in those cases usually is some operation on a distributed cache (e.g. remove()):

    http-threads - 178 awaiting notification on [ 0x0000000374059958 ]
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163)
        at org.jgroups.util.CreditMap.decrement(CreditMap.java:157)
        at org.jgroups.protocols.MFC.handleDownMessage(MFC.java:104)
        at org.jgroups.protocols.FlowControl.down(FlowControl.java:341)
        at org.jgroups.protocols.FRAG2.down(FRAG2.java:148)
        at org.jgroups.protocols.RSVP.down(RSVP.java:143)
        at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:1030)
        at org.jgroups.JChannel.down(JChannel.java:722)
        at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.down(MessageDispatcher.java:618)
        at org.jgroups.blocks.RequestCorrelator.sendRequest(RequestCorrelator.java:174)
        at org.jgroups.blocks.GroupRequest.sendRequest(GroupRequest.java:360)
        at org.jgroups.blocks.GroupRequest.sendRequest(GroupRequest.java:103)
        at org.jgroups.blocks.Request.execute(Request.java:83)
        at org.jgroups.blocks.MessageDispatcher.cast(MessageDispatcher.java:337)
        at org.jgroups.blocks.MessageDispatcher.castMessage(MessageDispatcher.java:249)
        at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.processCalls(CommandAwareRpcDispatcher.java:333)
        at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommands(CommandAwareRpcDispatcher.java:146)
        at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.broadcastRemoteCommands(CommandAwareRpcDispatcher.java:197)
        at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:498)
        at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:173)
        at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:194)
        at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:251)
        at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:238)
        at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:233)
        at org.infinispan.remoting.rpc.RpcManagerImpl.broadcastRpcCommand(RpcManagerImpl.java:212)
        at org.infinispan.remoting.rpc.RpcManagerImpl.broadcastRpcCommand(RpcManagerImpl.java:204)
        at org.infinispan.interceptors.ReplicationInterceptor.handleCrudMethod(ReplicationInterceptor.java:307)
        at org.infinispan.interceptors.ReplicationInterceptor.visitRemoveCommand(ReplicationInterceptor.java:269)
        at org.infinispan.commands.write.RemoveCommand.acceptVisitor(RemoveCommand.java:73)
        at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:120)
        at org.infinispan.interceptors.EntryWrappingInterceptor.invokeNextAndApplyChanges(EntryWrappingInterceptor.java:302)
        at org.infinispan.interceptors.EntryWrappingInterceptor.visitRemoveCommand(EntryWrappingInterceptor.java:207)
        at org.infinispan.commands.write.RemoveCommand.acceptVisitor(RemoveCommand.java:73)
        at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:120)
        at org.infinispan.interceptors.locking.NonTransactionalLockingInterceptor.visitRemoveCommand(NonTransactionalLockingInterceptor.java:124)
        at org.infinispan.commands.write.RemoveCommand.acceptVisitor(RemoveCommand.java:73)
        at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:120)
        at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:134)
        at org.infinispan.commands.AbstractVisitor.visitRemoveCommand(AbstractVisitor.java:67)
        at org.infinispan.commands.write.RemoveCommand.acceptVisitor(RemoveCommand.java:73)
        at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:120)
        at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:134)
        at org.infinispan.commands.AbstractVisitor.visitRemoveCommand(AbstractVisitor.java:67)
        at org.infinispan.commands.write.RemoveCommand.acceptVisitor(RemoveCommand.java:73)
        at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:120)
        at org.infinispan.statetransfer.StateTransferInterceptor.handleTopologyAffectedCommand(StateTransferInterceptor.java:284)
        at org.infinispan.statetransfer.StateTransferInterceptor.handleNonTxWriteCommand(StateTransferInterceptor.java:222)
        at org.infinispan.statetransfer.StateTransferInterceptor.visitRemoveCommand(StateTransferInterceptor.java:171)
        at org.infinispan.commands.write.RemoveCommand.acceptVisitor(RemoveCommand.java:73)
        at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:120)
        at org.infinispan.interceptors.CacheMgmtInterceptor.visitRemoveCommand(CacheMgmtInterceptor.java:137)
        at org.infinispan.commands.write.RemoveCommand.acceptVisitor(RemoveCommand.java:73)
        at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:120)
        at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:128)
        at org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:92)
        at org.infinispan.commands.AbstractVisitor.visitRemoveCommand(AbstractVisitor.java:67)
        at org.infinispan.commands.write.RemoveCommand.acceptVisitor(RemoveCommand.java:73)
        at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:343)
        at org.infinispan.CacheImpl.executeCommandAndCommitIfNeeded(CacheImpl.java:1186)
        at org.infinispan.CacheImpl.removeInternal(CacheImpl.java:314)
        at org.infinispan.CacheImpl.remove(CacheImpl.java:308)
        at org.infinispan.CacheImpl.remove(CacheImpl.java:302)
        at org.infinispan.AbstractDelegatingCache.remove(AbstractDelegatingCache.java:313)
        at com.company.project.information.AuthenticationService.processChallengeImpl(AuthenticationService.java:155)
        at com.company.project.information.AuthenticationService.processChallengeForLogin(AuthenticationService.java:132)
        at com.company.project.information.AuthenticationService.respondChallenge(AuthenticationService.java:418)
        at com.company.project.security.auth.ServerLoginModule.verify(ServerLoginModule.java:280)
        at com.company.project.security.auth.ServerLoginModule._login(ServerLoginModule.java:166)
        at com.company.project.security.auth.ServerLoginModule.login(ServerLoginModule.java:87)
        at sun.reflect.GeneratedMethodAccessor487.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
        at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
        at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
        at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
        at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
        at org.jboss.security.authentication.JBossCachedAuthenticationManager.defaultLogin(JBossCachedAuthenticationManager.java:399)
        at org.jboss.security.authentication.JBossCachedAuthenticationManager.proceedWithJaasLogin(JBossCachedAuthenticationManager.java:338)
        at org.jboss.security.authentication.JBossCachedAuthenticationManager.authenticate(JBossCachedAuthenticationManager.java:326)
        at org.jboss.security.authentication.JBossCachedAuthenticationManager.isValid(JBossCachedAuthenticationManager.java:142)
        at org.jboss.as.security.service.SimpleSecurityManager.authenticate(SimpleSecurityManager.java:418)
        at org.jboss.as.security.service.SimpleSecurityManager.authenticate(SimpleSecurityManager.java:377)
        at org.jboss.as.ejb3.security.SecurityContextInterceptor$1.run(SecurityContextInterceptor.java:54)
        at org.jboss.as.ejb3.security.SecurityContextInterceptor$1.run(SecurityContextInterceptor.java:48)
        at org.jboss.as.ejb3.security.SecurityContextInterceptor.processInvocation(SecurityContextInterceptor.java:86)
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
        at org.jboss.as.ejb3.deployment.processors.StartupAwaitInterceptor.processInvocation(StartupAwaitInterceptor.java:22)
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
        at org.jboss.as.ejb3.component.interceptors.ShutDownInterceptorFactory$1.processInvocation(ShutDownInterceptorFactory.java:64)
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
        at org.jboss.as.ejb3.component.interceptors.LoggingInterceptor.processInvocation(LoggingInterceptor.java:59)
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
        at org.jboss.as.ee.component.NamespaceContextInterceptor.processInvocation(NamespaceContextInterceptor.java:50)
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
        at org.jboss.as.ejb3.component.interceptors.AdditionalSetupInterceptor.processInvocation(AdditionalSetupInterceptor.java:55)
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
        at org.jboss.as.ee.component.TCCLInterceptor.processInvocation(TCCLInterceptor.java:45)
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
        at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61)
        at org.jboss.as.ee.component.ViewService$View.invoke(ViewService.java:189)
        at org.jboss.as.ejb3.remote.LocalEjbReceiver.processInvocation(LocalEjbReceiver.java:271)
        at org.jboss.ejb.client.EJBClientInvocationContext.sendRequest(EJBClientInvocationContext.java:184)
        at org.jboss.ejb.client.EJBObjectInterceptor.handleInvocation(EJBObjectInterceptor.java:58)
        at org.jboss.ejb.client.EJBClientInvocationContext.sendRequest(EJBClientInvocationContext.java:186)
        at org.jboss.ejb.client.EJBHomeInterceptor.handleInvocation(EJBHomeInterceptor.java:83)
        at org.jboss.ejb.client.EJBClientInvocationContext.sendRequest(EJBClientInvocationContext.java:186)
        at org.jboss.ejb.client.TransactionInterceptor.handleInvocation(TransactionInterceptor.java:42)
        at org.jboss.ejb.client.EJBClientInvocationContext.sendRequest(EJBClientInvocationContext.java:186)
        at org.jboss.ejb.client.ReceiverInterceptor.handleInvocation(ReceiverInterceptor.java:125)
        at org.jboss.ejb.client.EJBClientInvocationContext.sendRequest(EJBClientInvocationContext.java:186)
        at com.company.project.module.service.CallIdPropagator$1.handleInvocation(CallIdPropagator.java:60)
        at org.jboss.ejb.client.EJBClientInvocationContext.sendRequest(EJBClientInvocationContext.java:186)
        at org.jboss.ejb.client.EJBInvocationHandler.sendRequestWithPossibleRetries(EJBInvocationHandler.java:255)
        at org.jboss.ejb.client.EJBInvocationHandler.doInvoke(EJBInvocationHandler.java:200)
        at org.jboss.ejb.client.EJBInvocationHandler.doInvoke(EJBInvocationHandler.java:183)
        at org.jboss.ejb.client.EJBInvocationHandler.invoke(EJBInvocationHandler.java:146)
        at com.sun.proxy.$Proxy653.hasOneOfThisNetworkFeatures(
        Unknown Source)
        at sun.reflect.GeneratedMethodAccessor773.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at com.cgm.life.clientservices.Invoker.doPost(Invoker.java:129)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:754)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:295)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214)
        at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:231)
        at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:149)
        at org.jboss.as.jpa.interceptor.WebNonTxEmCloserValve.invoke(WebNonTxEmCloserValve.java:50)
        at org.jboss.as.jpa.interceptor.WebNonTxEmCloserValve.invoke(WebNonTxEmCloserValve.java:50)
        at org.jboss.as.web.security.SecurityContextAssociationValve.invoke(SecurityContextAssociationValve.java:169)
        at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:150)
        at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:97)
        at org.jboss.web.rewrite.RewriteValve.invoke(RewriteValve.java:466)
        at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:102)
        at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:344)
        at org.apache.coyote.ajp.AjpAprProcessor.process(AjpAprProcessor.java:475)
        at org.apache.coyote.ajp.AjpAprProtocol$AjpConnectionHandler.process(AjpAprProtocol.java:454)
        at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:2562)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
        at org.jboss.threads.JBossThread.run(JBossThread.java:122)
    

    (full stack trace just to give you the whole picture). The condition in this example (0x0000000374059958) blocks hundrets of threads with differing stack traces; all of them share the call to some distributed cache, though.

  2. Another group of threads (seemingly UDP protocol handlers, of which we have a whole bunch) also are awaiting notification on an object. The context seems to be something with state transfer in the Infinispan cluster:

    Incoming-8,shared=udp awaiting notification on [ 0x00000003810b5188 ]
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:502)
        at org.infinispan.statetransfer.StateTransferLockImpl.waitForTransactionData(StateTransferLockImpl.java:100)
        at org.infinispan.statetransfer.StateTransferInterceptor.updateTopologyIdAndWaitForTransactionData(StateTransferInterceptor.java:311)
        at org.infinispan.statetransfer.StateTransferInterceptor.handleTopologyAffectedCommand(StateTransferInterceptor.java:281)
        at org.infinispan.statetransfer.StateTransferInterceptor.handleNonTxWriteCommand(StateTransferInterceptor.java:222)
        at org.infinispan.statetransfer.StateTransferInterceptor.visitPutKeyValueCommand(StateTransferInterceptor.java:156)
        at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:82)
        at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:120)
        at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:128)
        at org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:92)
        at org.infinispan.commands.AbstractVisitor.visitPutKeyValueCommand(AbstractVisitor.java:62)
        at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:82)
        at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:343)
        at org.infinispan.commands.remote.BaseRpcInvokingCommand.processVisitableCommand(BaseRpcInvokingCommand.java:61)
        at org.infinispan.commands.remote.SingleRpcCommand.perform(SingleRpcCommand.java:70)
        at org.infinispan.remoting.InboundInvocationHandlerImpl.handleInternal(InboundInvocationHandlerImpl.java:100)
        at org.infinispan.remoting.InboundInvocationHandlerImpl.handleWithWaitForBlocks(InboundInvocationHandlerImpl.java:121)
        at org.infinispan.remoting.InboundInvocationHandlerImpl.handle(InboundInvocationHandlerImpl.java:85)
        at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.executeCommandFromLocalCluster(CommandAwareRpcDispatcher.java:247)
        at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:220)
        at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:484)
        at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:391)
        at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:249)
        at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:600)
        at org.jgroups.blocks.mux.MuxUpHandler.up(MuxUpHandler.java:130)
        at org.jgroups.JChannel.up(JChannel.java:707)
        at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1025)
        at org.jgroups.protocols.RSVP.up(RSVP.java:188)
        at org.jgroups.protocols.FRAG2.up(FRAG2.java:182)
        at org.jgroups.protocols.FlowControl.up(FlowControl.java:400)
        at org.jgroups.protocols.FlowControl.up(FlowControl.java:418)
        at org.jgroups.protocols.pbcast.GMS.up(GMS.java:897)
        at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:247)
        at org.jgroups.protocols.UNICAST2.up(UNICAST2.java:453)
        at org.jgroups.protocols.pbcast.NAKACK.handleMessage(NAKACK.java:793)
        at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:609)
        at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:147)
        at org.jgroups.protocols.FD.up(FD.java:253)
        at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:288)
        at org.jgroups.protocols.MERGE3.up(MERGE3.java:290)
        at org.jgroups.protocols.Discovery.up(Discovery.java:359)
        at org.jgroups.protocols.TP$ProtocolAdapter.up(TP.java:2616)
        at org.jgroups.protocols.TP.passMessageUp(TP.java:1269)
        at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1831)
        at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1804)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
    

    Only 4 threads are waiting on this notification.

  3. However, there are even more threads (like, hundrets) waiting on different objects, but at the same position in the code:

    OOB-4950,shared=udp awaiting notification on [ 0x000000038bccd2b0 ]
    OOB-4951,shared=udp awaiting notification on [ 0x00000003bacf4930 ]
    OOB-4952,shared=udp awaiting notification on [ 0x000000038bccd2b0 ]
    OOB-4953,shared=udp awaiting notification on [ 0x000000038bccd2b0 ]
    OOB-4954,shared=udp awaiting notification on [ 0x00000003bacf4930 ]
    OOB-4956,shared=udp awaiting notification on [ 0x000000038bccd2b0 ]
    OOB-4958,shared=udp awaiting notification on [ 0x000000038bccd2b0 ]
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:502)
        at org.infinispan.statetransfer.StateTransferLockImpl.waitForTransactionData(StateTransferLockImpl.java:100)
        at org.infinispan.statetransfer.StateTransferInterceptor.updateTopologyIdAndWaitForTransactionData(StateTransferInterceptor.java:311)
        at org.infinispan.statetransfer.StateTransferInterceptor.handleTopologyAffectedCommand(StateTransferInterceptor.java:281)
        at org.infinispan.statetransfer.StateTransferInterceptor.handleNonTxWriteCommand(StateTransferInterceptor.java:222)
        at org.infinispan.statetransfer.StateTransferInterceptor.visitRemoveCommand(StateTransferInterceptor.java:171)
        at org.infinispan.commands.write.RemoveCommand.acceptVisitor(RemoveCommand.java:73)
        at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:120)
        at org.infinispan.interceptors.CacheMgmtInterceptor.visitRemoveCommand(CacheMgmtInterceptor.java:137)
        at org.infinispan.commands.write.RemoveCommand.acceptVisitor(RemoveCommand.java:73)
        at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:120)
        at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:128)
        at org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:92)
        at org.infinispan.commands.AbstractVisitor.visitRemoveCommand(AbstractVisitor.java:67)
        at org.infinispan.commands.write.RemoveCommand.acceptVisitor(RemoveCommand.java:73)
        at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:343)
        at org.infinispan.commands.remote.BaseRpcInvokingCommand.processVisitableCommand(BaseRpcInvokingCommand.java:61)
        at org.infinispan.commands.remote.SingleRpcCommand.perform(SingleRpcCommand.java:70)
        at org.infinispan.remoting.InboundInvocationHandlerImpl.handleInternal(InboundInvocationHandlerImpl.java:100)
        at org.infinispan.remoting.InboundInvocationHandlerImpl.handleWithWaitForBlocks(InboundInvocationHandlerImpl.java:121)
        at org.infinispan.remoting.InboundInvocationHandlerImpl.handle(InboundInvocationHandlerImpl.java:85)
        at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.executeCommandFromLocalCluster(CommandAwareRpcDispatcher.java:247)
        at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:220)
        at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:484)
        at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:391)
        at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:249)
        at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:600)
        at org.jgroups.blocks.mux.MuxUpHandler.up(MuxUpHandler.java:130)
        at org.jgroups.JChannel.up(JChannel.java:707)
        at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1025)
        at org.jgroups.protocols.RSVP.up(RSVP.java:188)
        at org.jgroups.protocols.FRAG2.up(FRAG2.java:182)
        at org.jgroups.protocols.FlowControl.up(FlowControl.java:400)
        at org.jgroups.protocols.FlowControl.up(FlowControl.java:418)
        at org.jgroups.protocols.pbcast.GMS.up(GMS.java:897)
        at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:247)
        at org.jgroups.protocols.UNICAST2.up(UNICAST2.java:453)
        at org.jgroups.protocols.pbcast.NAKACK.handleMessage(NAKACK.java:751)
        at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:609)
        at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:147)
        at org.jgroups.protocols.FD.up(FD.java:253)
        at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:288)
        at org.jgroups.protocols.MERGE3.up(MERGE3.java:290)
        at org.jgroups.protocols.Discovery.up(Discovery.java:359)
        at org.jgroups.protocols.TP$ProtocolAdapter.up(TP.java:2616)
        at org.jgroups.protocols.TP.passMessageUp(TP.java:1269)
        at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1831)
        at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1804)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
    
  4. There're also threads waiting for a lock that is held by a thread waiting for the condition in (1):

    Transaction Reaper Worker 9 waiting to acquire [ 0x0000000507058c88 ]                                    
        at com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.afterCompletion(TwoPhaseCoordinator.java:356
        at com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.afterCompletion(TwoPhaseCoordinator.java:334
        at com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.cancel(TwoPhaseCoordinator.java:120)        
        at com.arjuna.ats.arjuna.AtomicAction.cancel(AtomicAction.java:215)                                  
        at com.arjuna.ats.arjuna.coordinator.TransactionReaper.doCancellations(TransactionReaper.java:370)   
        at com.arjuna.ats.internal.arjuna.coordinator.ReaperWorkerThread.run(ReaperWorkerThread.java:78)
    

    One of the http threads holds 0x0000000507058c88 that in turn waits for 0x0000000374059958

That's about it with respect to what I can read from the system. Something is seriously stuck inside, but I have no idea how to analyze or troubleshoot, let alone solve the problems. (We tried updating to EAP 6.4.20, but the same behavior could be observed here, so we rolled this update back due to other, newer, shinier problems that occured after the update).

I'll gladly provide more information (like specific config params) that can help to diagnose the problem.

Thanks for anyone helping out!

Best,
Markus


Solution

  • Turns out it makes a lot of sense to monitor your GC log for interesting events... x_X

    The situation was that we suffered Old-Gen fragmentation with CMS, and the Java Virtual Machine periodically decided to pause the application server for up to 45 seconds to do defragmentation.

    We switched to G1 and these pauses along with Infinispan-desynchronizations vanished as well. This setup is now running quite stable for the last couple of months.