Search code examples
apache-flinkflink-streaming

Select node for a Flink DataStream execution


I have been searching a lot but I didn't found a solution for this.

Lets supose some of the steps on the streaming process must be executed in just a subset of the available nodes/taskmanagers, while the rest of the tasks are free to be computed anywhere.

¿How can I assign a DataStream to be executed ONLY in a node subset?

This is required mainly for input/sink tasks as not every node in the cluster have the same conectivity / security restrictions.

I'm new on flink, so please forgive me if I'm asking for something obvious.

Thanks a lot.


Solution

  • As explained in the thread [1]: It is not completely possible to achieve this at "DataStream Level" but at "Job Level".

    As explained by Vino Yang [1] in flink 1.6 over Yarn we can set up labels for jobs [2] and get some "allocability control" to our jobs, but it is possible for "low-level" tasks.

    Thanks Vino for sharing his knowledge.

    [1] http://mail-archives.apache.org/mod_mbox/flink-user/201808.mbox/%[email protected]%3E

    [2] https://issues.apache.org/jira/browse/FLINK-7836