Search code examples
spring-bootssljava-11undertowxnio

Undertow becomes unresponsive


We have just recently started experiencing a problem with the spring boot (v2.7.4) undertow server (which is version 2.2.22-Final).

After running fine for sometime, the server becomes unresponsive to http requests.

This is running in docker, based on the Java 11 Terumin image (11.0.18_10-jdk-focal)

running: top -H -p 9

top - 15:54:59 up 2 days,  5:10,  0 users,  load average: 3.00, 3.04, 3.07
Threads: 104 total,   3 running, 101 sleeping,   0 stopped,   0 zombie
%Cpu(s): 42.3 us, 33.4 sy,  0.0 ni, 24.1 id,  0.0 wa,  0.0 hi,  0.1 si,  0.0 st
MiB Mem :  31790.2 total,  17321.1 free,   7058.5 used,   7410.6 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  24271.6 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                               
  130 user      20   0   16.2g   3.6g  28156 R  99.9  11.6 304:41.84 XNIO-1 I/O-3                                                                                          
  128 user      20   0   16.2g   3.6g  28156 R  99.9  11.6 210:30.67 XNIO-1 I/O-1                                                                                          
  131 user      20   0   16.2g   3.6g  28156 R  99.7  11.6  91:28.80 XNIO-1 I/O-4                                                                                          
   15 user      20   0   16.2g   3.6g  28156 S   0.3  11.6   1:08.93 G1 Young RemSet                                                                                       
    9 user      20   0   16.2g   3.6g  28156 S   0.0  11.6   0:00.00 java                                                                                                  
   10 user      20   0   16.2g   3.6g  28156 S   0.0  11.6   0:22.31 java                                                                                                  
   11 user      20   0   16.2g   3.6g  28156 S   0.0  11.6   0:08.58 GC Thread#0                                                                                           
   12 user      20   0   16.2g   3.6g  28156 S   0.0  11.6   0:00.00 G1 Main Marker                                                                                        
   13 user      20   0   16.2g   3.6g  28156 S   0.0  11.6   0:00.63 G1 Conc#0                                                                                             
   14 user      20   0   16.2g   3.6g  28156 S   0.0  11.6   0:00.09 G1 Refine#0                                                                                           
   16 user      20   0   16.2g   3.6g  28156 S   0.0  11.6   0:09.77 VM Thread                                                                                             
   17 user      20   0   16.2g   3.6g  28156 S   0.0  11.6   0:00.00 Reference Handl    

Any idea what might be causing these threads to exhibit this behaviour?

This is the only server config we have for undertow:

@Configuration
class ServerConfiguration {
    @Bean
    fun embeddedServletContainerFactory(): UndertowServletWebServerFactory {
        val factory = UndertowServletWebServerFactory()
        factory.addBuilderCustomizers(UndertowBuilderCustomizer { builder ->
            builder.setServerOption(UndertowOptions.ENABLE_HTTP2, true)
        })
        return factory
    }
}

Attaching VisualVM and capturing the moment this occurs:

VisualVm Monitor capture

Capturing the problem threads give this:

VisualVM capture

Possibly related: Changes in SSLEngine usage when going up to TLSv1.3
nope - adding the -Djdk.tls.acknowledgeCloseNotify=true does not solve the problem.

TLSv1.3 issue? https://github.com/undertow-io/undertow/pull/721
nope - Disabling TLSv1.3 does not solve the problem

Java 11.0.18 issue?
nope - Downgrading to jdk 11.0.16 does not solve the problem.

Have found this: https://issues.redhat.com/browse/UNDERTOW-2239 which looks very similar.

Upgrading to 2.2.24.Final which has this fix and will share back results


Solution

  • I believe this problem has been fixed in version 2.2.24-Final of Undertow.

    Our server has now been running solidly for two days which we've not managed since this problem showed itself

    See this Jira for details: https://issues.redhat.com/browse/UNDERTOW-2239