Search code examples
javamultithreadingoracle-databaseweblogicweblogic12c

weblogic.socket.Muxer uses 100% cpu


We've recently started experiencing with deployments in Weblogic 12c using the weblogic.Deployer utility. We can deploy an EAR fine, but whenever we try to undeploy that application with the Managed Server still running it will start using 100% of our CPU (4-core Xeon, bare-metal).

After some tinkering and countless thread dumps, we could isolate the problem on 4 stuck threads. Each one of them consumed 100% on a core. The load average would jump from something around 0.10 to 4.00 in 5 minutes tops.

This is the threads that seems to be stuck:

"ExecuteThread: '3' for queue: 'weblogic.socket.Muxer'" daemon prio=10 tid=0x00007fb52801c800 nid=0x6bf0 runnable [0x00007fb58a0ad000]
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
        - locked <0x00000000e18c66d0> (a sun.nio.ch.Util$2)
        - locked <0x00000000e18c66c0> (a java.util.Collections$UnmodifiableSet)
        - locked <0x00000000e18c6598> (a sun.nio.ch.EPollSelectorImpl)
        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:102)
        at weblogic.socket.NIOSocketMuxer.selectFrom(NIOSocketMuxer.java:541)
        at weblogic.socket.NIOSocketMuxer.processSockets(NIOSocketMuxer.java:470)
        at weblogic.socket.SocketReaderRequest.run(SocketReaderRequest.java:30)
        at weblogic.socket.SocketReaderRequest.execute(SocketReaderRequest.java:43)
        at weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:147)
        at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:119)

I've seem many people with the same problem (not with Weblogic, though):

https://github.com/netty/netty/issues/327

https://issues.jboss.org/browse/XNIO-172

Why does select() consume so much CPU time in my program?

I don't think this could be happening because an old JDK version. java -version says:

java version "1.7.0_67"
Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)

I googled a little bit but did not find anything on that. Do you WL experts know what could be the cause of this problem ?

Thanks a lot!


Solution

  • After much tinkering, an almost sleepless night and googling till I bled, I'm almost sure I got it solved.

    This solution is heavily based on another thread: https://stackoverflow.com/a/7827952/1484232

    To summarize the whole shebang, GC threads collision (most likely) were causing the issues here. After applying some parameters to my VM, it was magically solved.

    -XX:+UseConcMarkSweepGC 
    -XX:+UseParNewGC 
    -XX:ParallelCMSThreads=2 
    -XX:+CMSParallelRemarkEnabled 
    -XX:+CMSIncrementalMode 
    -XX:+CMSIncrementalPacing 
    -XX:CMSFullGCsBeforeCompaction=1 
    -XX:+CMSClassUnloadingEnabled 
    -XX:CMSInitiatingOccupancyFraction=80
    

    If anyone ever has the same trouble, this can be used as a try to get things working again.

    Cheers.