I am running a flink streaming job inside a flink cluster. I need to simulate a job failure scenario. I have introduced a corrupted event in my source. When that event comes I see that job throwing exceptions and tasks have restarted from the checkpoint. I have set the below configurations:
restart-strategy.fixed-delay.attempts: 1
restart-strategy.fixed-delay.delay: 5 s
But these configurations are not honoured. The task keeps recovering even after 1st attempt. According to the documents the job should be failed after 1st attempt. I need to simulate the job failure scenario. How to do it?
You can set
restart-strategy: none
in which case the job will fail directly, without attempting a restart.