I am running a self-hosted OKD 4 cluster with minimum production requirements (3 control planes and two compute nodes). This setup includes a Jenkins installation - installed via Helm (https://www.jenkins.io/doc/book/installing/kubernetes/) So far everything worked fine: builds start automatically when changes are pushed to Github and when they are successful are deployed to the same cluster where Jenkins runs in.
But currently I am facing the problem that when a build job executes a Spring Boot test which fires up a persistence context. The build agent (a jdk-11 image, see additionalAgent configuration below) gets killed as soon as Spring starts up the persistence context. Downloading dependencies and compilation works fine, btw.
additionalAgents:
jdk-11:
podName: jdk-11
customJenkinsLabels: jdk-11
image: jenkins/jnlp-agent-jdk11
tag: latest
...
When the tests are disabled the job runs fine. But as soon as the persistence gets initialised the agent gets killed.
Those are the configurations I have tried for the test:
The time where the job gets killed is almost the same: For 1. it is
2021-10-20 22:44:06.637 INFO 299 --- [ main] .s.d.r.c.RepositoryConfigurationDelegate : Bootstrapping Spring Data JPA repositories in DEFAULT mode.
2021-10-20 22:44:07.032 INFO 299 --- [ main] .s.d.r.c.RepositoryConfigurationDelegate : Finished Spring Data repository scanning in 310 ms. Found 2 JPA repository interfaces.
2021-10-20 22:44:08.240 INFO 299 --- [ main] o.s.cloud.context.scope.GenericScope : BeanFactory id=1c9e8306-7514-338e-8a9f-3cfba5c1169b
2021-10-20 22:44:10.527 INFO 299 --- [ main] o.f.c.internal.license.VersionPrinter : Flyway Community Edition 7.7.3 by Redgate
2021-10-20 22:44:10.532 INFO 299 --- [ main] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Starting...
2021-10-20 22:44:11.744 INFO 299 --- [ main] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Start completed.
2021-10-20 22:44:12.041 INFO 299 --- [ main] o.f.c.i.database.base.DatabaseType : Database: jdbc:h2:mem:testdb (H2 1.4)
Killed
For 2.
2021-10-21 19:50:51.604 INFO 306 --- [ main] .s.d.r.c.RepositoryConfigurationDelegate : Bootstrapping Spring Data JPA repositories in DEFAULT mode.
2021-10-21 19:50:52.005 INFO 306 --- [ main] .s.d.r.c.RepositoryConfigurationDelegate : Finished Spring Data repository scanning in 391 ms. Found 2 JPA repository interfaces.
2021-10-21 19:50:53.510 INFO 306 --- [ main] o.s.cloud.context.scope.GenericScope : BeanFactory id=0fd77ef3-b5a2-35cb-b157-6d27c0cfe9a5
2021-10-21 19:50:56.405 INFO 306 --- [ main] o.hibernate.jpa.internal.util.LogHelper : HHH000204: Processing PersistenceUnitInfo [name: default]
2021-10-21 19:50:56.708 INFO 306 --- [ main] org.hibernate.Version : HHH000412: Hibernate ORM core version 5.4.32.Final
2021-10-21 19:50:57.503 INFO 306 --- [ main] o.hibernate.annotations.common.Version : HCANN000001: Hibernate Commons Annotations {5.1.2.Final}
Killed
And for 3.
2021-10-21 22:02:48.810 INFO 309 --- [ main] .s.d.r.c.RepositoryConfigurationDelegate : Bootstrapping Spring Data JPA repositories in DEFAULT mode.
2021-10-21 22:02:49.198 INFO 309 --- [ main] .s.d.r.c.RepositoryConfigurationDelegate : Finished Spring Data repository scanning in 380 ms. Found 2 JPA repository interfaces.
2021-10-21 22:02:50.509 INFO 309 --- [ main] o.s.cloud.context.scope.GenericScope : BeanFactory id=0fd77ef3-b5a2-35cb-b157-6d27c0cfe9a5
2021-10-21 22:02:53.523 INFO 309 --- [ main] o.hibernate.jpa.internal.util.LogHelper : HHH000204: Processing PersistenceUnitInfo [name: default]
2021-10-21 22:02:53.898 INFO 309 --- [ main] org.hibernate.Version : HHH000412: Hibernate ORM core version 5.4.32.Final
Killed
The log of the Jenkins pod just states
Terminated Kubernetes instance for agent jenkins/jdk-11-bjtz5
Disconnected computer jdk-11-bjtz5
2021-10-21 22:02:57.342+0000 [id=465] INFO o.c.j.p.k.KubernetesSlave#deleteSlavePod: Terminated Kubernetes instance for agent jenkins/jdk-11-bjtz5
2021-10-21 22:02:57.342+0000 [id=465] INFO o.c.j.p.k.KubernetesSlave#_terminate: Disconnected computer jdk-11-bjtz5
2021-10-21 22:02:57.356+0000 [id=436] INFO j.s.DefaultJnlpSlaveReceiver#channelClosed: Computer.threadPoolForRemoting [#56] for jdk-11-bjtz5 terminated: java.nio.channels.ClosedChannelException
In all cases there are no exceptions, stacktraces or suspicious events. And these steps are reproducable - when I run the build with the same configuration again the agents gets killed at exactly the same step in the test.
The setup:
I am presuming that a little bit of configuration is missing to make this work. But I just cannot find what it could be. So I am asking: have had anyone the same issue here? Or any guesses what the missing part could be?
When there is some information missing please point it out and I will add it.
After a little step back I took a look into the actual pod which runs the build. And found out that the memory limit of the agents was the problem.
So increasing the limit solved the problem!
I have done that by modifying the local jenkins-values.yaml
and updated the limits
section of the agent:
block in it.
A little confusing for me was the fact that no log entry stated the exceed of memory usage.
Next thought is that I will set a memory limit for the test step via Java options to kill the maven process before the pod exceeds the limit. Guess it would be more transparent in the build.
And as a sidenote: the limit has been set to 512Mi previously and has been exceeded by ~10 MiB -.-
Luckily for me I have found it at this point and the other build-jobs where running fine was just due to the lack of resource-usage (haven't figured that just starting Hibernate would exceed the 512 MiB mark)