I'm facing a situation where my application hangs at startup because of a "deadlock" situation related to InetAddress.getByName
but it's not clear to me what would be the way to fix it.
To give some context, the 2 threads involved are not directly in my control:
The relevant code of the 1st thread is:
new InetSocketAddress("0.0.0.0", somePort)
And the second:
static final InetAddress INET6_ANY = InetAddress.getByName("::")
static final InetAddress INET_ANY = InetAddress.getByName("0.0.0.0")
I've read that using InetAddress
may involve some blocking etc.. but why would it hangs forever? Especially as we're referring to the local address 0.0.0.0
and not some remote address.
This app is running in a container in Kubernetes if this could explain something.
$ cat /etc/hosts
# Kubernetes-managed hosts file.
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
fe00::0 ip6-mcastprefix
fe00::1 ip6-allnodes
fe00::2 ip6-allrouters
10.42.126.184 my-app-756f44d67-tgr5b
Note that this is not always reproducible but we've seen several occurrences lately.
Could this be a "bug" as in somehow a misusage of the libraries? Or am I maybe missing something that must be defined for such code to work in a Kubernetes context?
For completeness, here is the thread dump for these 2 threads.
The one BLOCKED:
"ZScheduler-Worker-6" #30 daemon prio=5 os_prio=0 cpu=276.23ms elapsed=7541.20s tid=0x00007f7bd54d4880 nid=0x55 waiting for monitor entry [0x00007f7b663f4000]
java.lang.Thread.State: BLOCKED (on object monitor)
at jdk.internal.loader.NativeLibraries.loadLibrary(java.base@17.0.10/Unknown Source)
- waiting to lock <0x00000000a02e9a90> (a java.util.HashSet)
at jdk.internal.loader.NativeLibraries.loadLibrary(java.base@17.0.10/Unknown Source)
at jdk.internal.loader.NativeLibraries.findFromPaths(java.base@17.0.10/Unknown Source)
at jdk.internal.loader.NativeLibraries.loadLibrary(java.base@17.0.10/Unknown Source)
at jdk.internal.loader.NativeLibraries.loadLibrary(java.base@17.0.10/Unknown Source)
at jdk.internal.loader.BootLoader.loadLibrary(java.base@17.0.10/Unknown Source)
at java.net.InetAddress.<clinit>(java.base@17.0.10/Unknown Source)
at java.net.InetSocketAddress.<init>(java.base@17.0.10/Unknown Source)
at io.prometheus.metrics.exporter.httpserver.HTTPServer$Builder.makeInetSocketAddress(HTTPServer.java:209)
at io.prometheus.metrics.exporter.httpserver.HTTPServer$Builder.buildAndStart(HTTPServer.java:197)
at io.opentelemetry.exporter.prometheus.PrometheusHttpServer.<init>(PrometheusHttpServer.java:71)
at io.opentelemetry.exporter.prometheus.PrometheusHttpServerBuilder.build(PrometheusHttpServerBuilder.java:68)
at com.myapp.metrics.sdk.PrometheusMetricReader$.$anonfun$startReader$2(PrometheusMetricReader.scala:21)
at com.myapp.metrics.sdk.PrometheusMetricReader$$$Lambda$1109/0x00007f7b7840e078.apply(Unknown Source)
at zio.ZIOCompanionVersionSpecific.$anonfun$attempt$1(ZIOCompanionVersionSpecific.scala:100)
at zio.ZIOCompanionVersionSpecific$$Lambda$430/0x00007f7b782ba000.apply(Unknown Source)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:904)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:890)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:890)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:890)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1024)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:890)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1024)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:967)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1024)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:890)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1024)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:967)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:890)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:890)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1024)
at zio.internal.FiberRuntime.evaluateEffect(FiberRuntime.scala:381)
at zio.internal.FiberRuntime.evaluateMessageWhileSuspended(FiberRuntime.scala:504)
at zio.internal.FiberRuntime.drainQueueOnCurrentThread(FiberRuntime.scala:220)
at zio.internal.FiberRuntime.run(FiberRuntime.scala:139)
at zio.internal.ZScheduler$$anon$4.run(ZScheduler.scala:478)
The one "locking":
"ZScheduler-Worker-20" #44 daemon prio=5 os_prio=0 cpu=191.27ms elapsed=7541.20s tid=0x00007f7bd54e2e70 nid=0x63 in Object.wait() [0x00007f7b655e3000]
java.lang.Thread.State: RUNNABLE
at io.netty.channel.epoll.LinuxSocket.unsafeInetAddrByName(LinuxSocket.java:364)
- waiting on the Class initialization monitor for java.net.InetAddress
at io.netty.channel.epoll.LinuxSocket.<clinit>(LinuxSocket.java:42)
at jdk.internal.loader.NativeLibraries.load(java.base@17.0.10/Native Method)
at jdk.internal.loader.NativeLibraries$NativeLibraryImpl.open(java.base@17.0.10/Unknown Source)
at jdk.internal.loader.NativeLibraries.loadLibrary(java.base@17.0.10/Unknown Source)
- locked <0x00000000a02e9a90> (a java.util.HashSet)
at jdk.internal.loader.NativeLibraries.loadLibrary(java.base@17.0.10/Unknown Source)
at java.lang.ClassLoader.loadLibrary(java.base@17.0.10/Unknown Source)
at java.lang.Runtime.load0(java.base@17.0.10/Unknown Source)
at java.lang.System.load(java.base@17.0.10/Unknown Source)
at io.netty.util.internal.NativeLibraryUtil.loadLibrary(NativeLibraryUtil.java:36)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@17.0.10/Native Method)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@17.0.10/Unknown Source)
at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@17.0.10/Unknown Source)
at java.lang.reflect.Method.invoke(java.base@17.0.10/Unknown Source)
at io.netty.util.internal.NativeLibraryLoader$1.run(NativeLibraryLoader.java:430)
at java.security.AccessController.executePrivileged(java.base@17.0.10/Unknown Source)
at java.security.AccessController.doPrivileged(java.base@17.0.10/Unknown Source)
at io.netty.util.internal.NativeLibraryLoader.loadLibraryByHelper(NativeLibraryLoader.java:422)
at io.netty.util.internal.NativeLibraryLoader.loadLibrary(NativeLibraryLoader.java:388)
at io.netty.util.internal.NativeLibraryLoader.load(NativeLibraryLoader.java:218)
at io.netty.channel.epoll.Native.loadNativeLibrary(Native.java:323)
at io.netty.channel.epoll.Native.<clinit>(Native.java:85)
at io.netty.channel.epoll.Epoll.<clinit>(Epoll.java:40)
at zio.http.netty.ChannelFactories$Client$.$anonfun$fromConfig$4(ChannelFactories.scala:83)
at zio.http.netty.ChannelFactories$Client$$$Lambda$966/0x00007f7b783d3e60.apply(Unknown Source)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:890)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:890)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:890)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1024)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:967)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:890)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:890)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1024)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:890)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1024)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:967)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:890)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:890)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1024)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1024)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:967)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:967)
at zio.internal.FiberRuntime.evaluateEffect(FiberRuntime.scala:381)
at zio.internal.FiberRuntime.evaluateMessageWhileSuspended(FiberRuntime.scala:504)
at zio.internal.FiberRuntime.drainQueueOnCurrentThread(FiberRuntime.scala:220)
at zio.internal.FiberRuntime.run(FiberRuntime.scala:139)
at zio.internal.ZScheduler$$anon$4.run(ZScheduler.scala:478)
This has been fixed in the Netty repository and will be included in Netty 4.1.108.Final
.
From your thread dumps, I can see the following:
InetAddress
waits for a native library to be loaded but it cannot because another thread is loading a native library.InetAddress
as part of loading a native library. Specifically, the class initializer of Netty's Native
class loads a native library (netty_transport_native_epoll
) which in turn makes an upcall to LinuxSocket
(or at least initializes it) which requires InetAddress
to be loaded.So the problem is that Netty uses InetAddress
while loading a native library which can occur during initialization.
You can make sure that InetAddress
is fully initialized before giving Netty a chance to do anything. You can do that by running InetAddress.getLocalHost();
at the beginning of your main. That should be before Netty is used anywhere and it should initialize InetAddress
You can file a bugreport to the Netty team (or even write a pull request yourself).
One solution they could implement is to initialize InetAddress
themselves before loading native libraries (that rely on it being loaded/loadable).
For example, they could add a InetAddress.getLocalHost();
call into the Native
class before actually loading stuff (e.g. at the beginning of Native.loadNativeLibrary
).
Alternatively, it might even possible to change something that loading the native library doesn't require InetAddress
at all. However, I don't have sufficient knowledge about Netty (internals) to be able to judge that.