After upgrading Jenkins to version 2.375.4
and Kubernetes AWS EKS cluster to v1.23
along with changing container runtime from docker
to containerd
, I sometimes get the following error on Jenkins jobs that run on Kubernetes AWS EKS cluster via Jenkins agent.
Below is the error I get:
03:39:51 java.nio.channels.ClosedChannelException
03:39:51 Also: hudson.remoting.Channel$CallSiteStackTrace: Remote call to JNLP4-connect connection from ip-10-20-53-103.eu-west-1.compute.internal/10.20.53.103:38004
03:39:51 at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1784)
03:39:51 at hudson.remoting.Request.call(Request.java:199)
03:39:51 at hudson.remoting.Channel.call(Channel.java:999)
03:39:51 at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.execute(RemoteGitImpl.java:153)
03:39:51 at jdk.internal.reflect.GeneratedMethodAccessor1121.invoke(Unknown Source)
03:39:51 at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
03:39:51 at java.base/java.lang.reflect.Method.invoke(Method.java:566)
03:39:51 at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.invoke(RemoteGitImpl.java:138)
03:39:51 at com.sun.proxy.$Proxy262.execute(Unknown Source)
03:39:51 at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1359)
03:39:51 at org.jenkinsci.plugins.workflow.steps.scm.SCMStep.checkout(SCMStep.java:129)
03:39:51 at org.jenkinsci.plugins.workflow.steps.scm.SCMStep$StepExecutionImpl.run(SCMStep.java:97)
03:39:51 at org.jenkinsci.plugins.workflow.steps.scm.SCMStep$StepExecutionImpl.run(SCMStep.java:84)
03:39:51 at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution.lambda$start$0(SynchronousNonBlockingStepExecution.java:47)
03:39:51 at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
03:39:51 at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
03:39:51 Also: org.jenkinsci.plugins.workflow.actions.ErrorAction$ErrorId: df677487-c98d-4870-aa71-74faab41e552
03:39:51 Also: org.jenkinsci.plugins.workflow.support.steps.AgentOfflineException: Unable to create live FilePath for current-frontend-e2e-native-deps-9106-4ptkj-3hltr-q0stv; current-frontend-e2e-native-deps-9106-4ptkj-3hltr-q0stv was marked offline: Connection was broken
03:39:51 at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepDynamicContext$FilePathTranslator.get(ExecutorStepDynamicContext.java:182)
03:39:51 at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepDynamicContext$FilePathTranslator.get(ExecutorStepDynamicContext.java:154)
03:39:51 at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepDynamicContext$Translator.get(ExecutorStepDynamicContext.java:147)
03:39:51 at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepDynamicContext$FilePathTranslator.get(ExecutorStepDynamicContext.java:164)
03:39:51 at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepDynamicContext$FilePathTranslator.get(ExecutorStepDynamicContext.java:154)
03:39:51 at org.jenkinsci.plugins.workflow.steps.DynamicContext$Typed.get(DynamicContext.java:95)
03:39:51 at org.jenkinsci.plugins.workflow.cps.ContextVariableSet.get(ContextVariableSet.java:139)
03:39:51 at org.jenkinsci.plugins.workflow.cps.CpsThread.getContextVariable(CpsThread.java:137)
03:39:51 at org.jenkinsci.plugins.workflow.cps.CpsStepContext.doGet(CpsStepContext.java:297)
03:39:51 at org.jenkinsci.plugins.workflow.cps.CpsBodySubContext.doGet(CpsBodySubContext.java:88)
03:39:51 at org.jenkinsci.plugins.workflow.support.DefaultStepContext.get(DefaultStepContext.java:75)
03:39:51 at org.jenkinsci.plugins.workflow.steps.CoreWrapperStep$Callback.finished(CoreWrapperStep.java:187)
03:39:51 at org.jenkinsci.plugins.workflow.steps.CoreWrapperStep$Execution2$Callback2.finished(CoreWrapperStep.java:150)
03:39:51 at org.jenkinsci.plugins.workflow.steps.GeneralNonBlockingStepExecution$TailCall.lambda$onFailure$1(GeneralNonBlockingStepExecution.java:156)
03:39:51 at org.jenkinsci.plugins.workflow.steps.GeneralNonBlockingStepExecution.lambda$run$0(GeneralNonBlockingStepExecution.java:77)
03:39:51 at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
03:39:51 at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
03:39:51 Caused: hudson.remoting.RequestAbortedException
03:39:51 at hudson.remoting.Request.abort(Request.java:346)
03:39:51 at hudson.remoting.Channel.terminate(Channel.java:1080)
03:39:51 at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer.onReadClosed(ChannelApplicationLayer.java:241)
03:39:51 at org.jenkinsci.remoting.protocol.ApplicationLayer.onRecvClosed(ApplicationLayer.java:221)
03:39:51 at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:825)
03:39:51 at org.jenkinsci.remoting.protocol.FilterLayer.onRecvClosed(FilterLayer.java:289)
03:39:51 at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecvClosed(SSLEngineFilterLayer.java:168)
03:39:51 at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:825)
03:39:51 at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:155)
03:39:51 at org.jenkinsci.remoting.protocol.impl.NIONetworkLayer.ready(NIONetworkLayer.java:143)
03:39:51 at org.jenkinsci.remoting.protocol.IOHub$OnReady.run(IOHub.java:789)
03:39:51 at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:30)
03:39:51 at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:70)
03:39:51 at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
03:39:51 at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
03:39:51 at java.base/java.lang.Thread.run(Thread.java:829)
What is the reason for it? How to fix it?
Possible Solutions:
Make sure that your kubernetes-plugin
is on the latest version and not outdated.
Make sure that the java
version of your Jenkins master matched with the java
version of Jenkins slave to avoid any incompatibilities.
Make sure that the pod is not throttling and has enough CPU and/or Memory. If not, increase one of them or both of them to fix this issue.
How did I find solution no. 3?
Looking at the metrics of the containers of that job pod on Grafana, I realised that CPU usage reached 100% that caused CPU throttling for the jnlp container. Increasing its CPU request and limit fixed the issue.
Old Configuration:
resources:
limits:
cpu: "2"
memory: "2Gi"
requests:
cpu: "2"
memory: "2Gi"
New Configuration:
resources:
limits:
cpu: "3"
memory: "2Gi"
requests:
cpu: "3"
memory: "2Gi"