Search code examples
javatomcatspring-bootkubernetesspring-boot-actuator

Kubernetes liveness - Reserve threads/memory for a specific endpoint with Spring Boot


Do you know (if it is possible) how to reserve threads/memory for a specific endpoint in a spring boot microservice?

I've one microservice that accepts HTTP Requests via Spring MVC, and those requests trigger http calls to 3rd system, which sometimes is partially degraded, and it responds very slow. I can't reduce the timeout time because there are some calls that are very slow by nature.

I've the spring-boot-actuator /health endpoint enabled and I use it like a container livenessProbe in a kubernetes cluster. Sometimes, when the 3rd system is degraded, the microservice doesn't respond to /health endpoint and kubernetes restarts my service.

This is because I'm using a RestTemplate to make HTTP calls, so I'm continuously creating new threads, and JVM starts to have problems with the memory.

I have thought about some solutions:

  1. Implement a high availability “/health” endpoint, reserve threads, or something like that.

  2. Use an async http client.

  3. Implement a Circuit Breaker.

  4. Configure custom timeouts per 3rd endpoint that I'm using.

  5. Create other small service (golang) and deploy it in the same pod. This service is going to process the liveness probe.

  6. Migrate/Refactor services to small services, and maybe with other framework/languages like Vert.x, go, etc.

What do you think?


Solution

  • The actuator health endpoint is very convenient with Spring boot - almost too convenient in this context as it does deeper health checks than you necessarily want in a liveness probe. For readiness you want to do deeper checks but not liveness. The idea is that if the Pod is overwhelmed for a bit and fails readiness then it will be withdrawn from the load balancing and get a breather. But if it fails liveness it will be restarted. So you want only minimal checks in liveness (Should Health Checks call other App Health Checks). By using actuator health for both there is no way for your busy Pods to get a breather as they get killed first. And kubernetes is periodically calling the http endpoint in performing both probes, which contributes further to your thread usage problem (do consider the periodSeconds on the probes).

    For your case you could define a liveness command and not an http probe - https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/#define-a-liveness-command. The command could just check that the Java process is running (so kinda similar to your go-based probe suggestion).

    For many cases using the actuator for liveness would be fine (think apps that hit a different constraint before threads, which would be your case if you went async/non-blocking with the reactive stack). Yours is one where it can cause problems - the actuator's probing of availability for dependencies like message brokers can be another where you get excessive restarts (in that case on first deploy).