Basic info
Hi, I'm encountering a problem with Kubernetes StatefulSets. I'm trying to spin up a set with 3 replicas. These replicas/pods each have a container which pings a container in the other pods based on their network-id. The container requires a response from all the pods. If it does not get a response the container will fail. In my situation I need 3 pods/replicas for my setup to work.
Problem description
What happens is the following. Kubernetes starts 2 pods rather fast. However since I need 3 pods for a fully functional cluster the first 2 pods keep crashing as the 3rd is not up yet. For some reason Kubernetes opts to keep restarting both pods instead of adding the 3rd pod so my cluster will function.
I've seen my setup run properly after about 15 minutes because Kubernetes added the 3rd pod by then.
Question
So, my question.
Does anyone know a way to delay restarting failed containers until the desired amount of pods/replicas have been booted?
I've since found out the cause of this. StatefulSets launch pods in a specific order. If one of the pods fails to launch it does not launch the next one.
You can add a podManagementPolicy: "Parallel"
to launch the pods without waiting for previous pods to be Running
.
See this documentation