Search code examples
apache-flinkflink-streaming

How to simulate flink streaming job failure scenario


I am running a flink streaming job inside a flink cluster. I need to simulate a job failure scenario. I have introduced a corrupted event in my source. When that event comes I see that job throwing exceptions and tasks have restarted from the checkpoint. I have set the below configurations:

restart-strategy.fixed-delay.attempts: 1
restart-strategy.fixed-delay.delay: 5 s

But these configurations are not honoured. The task keeps recovering even after 1st attempt. According to the documents the job should be failed after 1st attempt. I need to simulate the job failure scenario. How to do it?


Solution

  • You can set

    restart-strategy: none
    

    in which case the job will fail directly, without attempting a restart.

    https://ci.apache.org/projects/flink/flink-docs-stable/dev/task_failure_recovery.html#no-restart-strategy