Search code examples
javaakkaapache-flink

Pausing a dataflow in Apache Flink and viewing the state of operators


I am trying to address the following scenario:

I have a table of 1000 rows. I apply a map function to the rows of the table. After the dataflow has started, I want to randomly pause the execution of dataflow. Say,when I pause, 30 rows have been processed. So, the map operator should print its state i.e. 30 rows processed till now. After some time, I can resume the execution.

Is it possible to do so in Flink?

Apache Flink uses Akka i.e. it implements JobClient, JobManager and TaskManager as Akka actors. In Akka, actors can define their own behaviors and communicate through messages. I implemented the pause functionality in Akka, where actors receive Pause message and pause their execution. So I think maybe I can pause the program by sending JobManager a pause message from JobClient. Can someone guide me on how to do this?

If not, are there other ways to achieve these functionalities?


Solution

  • Pausing Flink is not supported, as such. It would be difficult to coordinate a global pause across a cluster, and the benefit of such a feature is not obvious.

    However, there are things you can do which might satisfy your objective(s). For example, if this is a streaming job, you could stop feeding in data. You could then inspect the values of various metrics, such as numRecordsIn, for the operators you are interested in.