We have just recently started experiencing a problem with the spring boot (v2.7.4) undertow server (which is version 2.2.22-Final).
After running fine for sometime, the server becomes unresponsive to http requests.
This is running in docker, based on the Java 11 Terumin image (11.0.18_10-jdk-focal)
running: top -H -p 9
top - 15:54:59 up 2 days, 5:10, 0 users, load average: 3.00, 3.04, 3.07
Threads: 104 total, 3 running, 101 sleeping, 0 stopped, 0 zombie
%Cpu(s): 42.3 us, 33.4 sy, 0.0 ni, 24.1 id, 0.0 wa, 0.0 hi, 0.1 si, 0.0 st
MiB Mem : 31790.2 total, 17321.1 free, 7058.5 used, 7410.6 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 24271.6 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
130 user 20 0 16.2g 3.6g 28156 R 99.9 11.6 304:41.84 XNIO-1 I/O-3
128 user 20 0 16.2g 3.6g 28156 R 99.9 11.6 210:30.67 XNIO-1 I/O-1
131 user 20 0 16.2g 3.6g 28156 R 99.7 11.6 91:28.80 XNIO-1 I/O-4
15 user 20 0 16.2g 3.6g 28156 S 0.3 11.6 1:08.93 G1 Young RemSet
9 user 20 0 16.2g 3.6g 28156 S 0.0 11.6 0:00.00 java
10 user 20 0 16.2g 3.6g 28156 S 0.0 11.6 0:22.31 java
11 user 20 0 16.2g 3.6g 28156 S 0.0 11.6 0:08.58 GC Thread#0
12 user 20 0 16.2g 3.6g 28156 S 0.0 11.6 0:00.00 G1 Main Marker
13 user 20 0 16.2g 3.6g 28156 S 0.0 11.6 0:00.63 G1 Conc#0
14 user 20 0 16.2g 3.6g 28156 S 0.0 11.6 0:00.09 G1 Refine#0
16 user 20 0 16.2g 3.6g 28156 S 0.0 11.6 0:09.77 VM Thread
17 user 20 0 16.2g 3.6g 28156 S 0.0 11.6 0:00.00 Reference Handl
Any idea what might be causing these threads to exhibit this behaviour?
This is the only server config we have for undertow:
@Configuration
class ServerConfiguration {
@Bean
fun embeddedServletContainerFactory(): UndertowServletWebServerFactory {
val factory = UndertowServletWebServerFactory()
factory.addBuilderCustomizers(UndertowBuilderCustomizer { builder ->
builder.setServerOption(UndertowOptions.ENABLE_HTTP2, true)
})
return factory
}
}
Attaching VisualVM and capturing the moment this occurs:
Capturing the problem threads give this:
Possibly related:
Changes in SSLEngine usage when going up to TLSv1.3
nope - adding the -Djdk.tls.acknowledgeCloseNotify=true
does not solve the problem.
TLSv1.3 issue? https://github.com/undertow-io/undertow/pull/721
nope - Disabling TLSv1.3 does not solve the problem
Java 11.0.18 issue?
nope - Downgrading to jdk 11.0.16 does not solve the problem.
Have found this: https://issues.redhat.com/browse/UNDERTOW-2239 which looks very similar.
Upgrading to 2.2.24.Final which has this fix and will share back results
I believe this problem has been fixed in version 2.2.24-Final of Undertow.
Our server has now been running solidly for two days which we've not managed since this problem showed itself
See this Jira for details: https://issues.redhat.com/browse/UNDERTOW-2239