I have a pod running on Kubernetes for which I am designing a liveness probe. My application reads from a queue (via a loop which continually searches for new messages and executes other functions if it finds one) and is not exposed via HTTP, so I need a command liveness probe. I am pondering whether a simple implementation would work:
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
However, I'm unsure whether the cat
would succeed even if the application was 'stuck' at some point in the loop - the file would still be there.
This comes down to a fundamental lack of understanding of liveness probes which I was unable to find in the documentation - presumably they run somehow in series with your application so if your app is not running, the command cannot be executed? But I am not confident on this point.
If the command can be executed in parallel then I believe I will need some kind of timestamp check where I update a file on each loop and the liveness probe checks its timestamp. If the first way works it is simpler, but can anyone confirm if this is the case? Thanks.
Edit: my app code. I added in the sleep(60)s to try and test whether the liveness probe would fail if the file hadn't been updated in a minute, but they wouldn't be part of the normal app code.
INITIALISATION CODE
with open('loaded.txt','w') as f: # readiness probe = check this file exists
f.write('loaded')
current_backoff = 0
max_backoff = 10
while True:
if current_backoff < max_backoff:
current_backoff +=1
with open('loaded.txt','w') as f:
f.write('loaded')
sleep(60)
messages = input_queue_client.receive_messages(visibility_timeout=100)
for message in messages:
with open('loaded.txt','w') as f:
f.write('loaded')
sleep(60)
current_backoff = 0
CODE TO PROCESS MESSAGES
sleep(current_backoff)
My liveness probe attempts:
1.
livenessProbe:
exec:
command:
- find
- /var/app/loaded.txt
- -mmin
- '+0.1'
initialDelaySeconds: 10
periodSeconds: 10
livenessProbe:
exec:
command:
- find
- /var/app/loaded.txt
- -mmin
- '+0.1'
- -exec
- cat
- '/var/app/loaded.txt{}'
- ;
initialDelaySeconds: 10
periodSeconds: 10
livenessProbe:
exec:
command:
- find
- /var/app/loaded.txt
- -mmin
- '+0.1'
- -exec
- if[[{}]]
- ;
initialDelaySeconds: 10
periodSeconds: 10
I have also tried all of these with - instead of +. The probe never fails despite the very short window (which will eventuallly be longer!) and the sleep command.
Liveness probing done by kubelet in each node. And yes, it runs in parallel with your application.
In you case, you could touch /tmp/healthy
file each time you start new iteration in loop. And use command like find /tmp/health -mmin +0.5
in health check. This command returns nothing if file is older than half a minute. If health check command returns nothing it's assumed that check is passing.