Search code examples
javaapache-flinkflink-streaming

Flink DataStream - how to start a source from an input element?


Say I have a Flink SourceFunction<String> called RequestsSource.

On each request coming in from that source, I would like to subscribe to an external data source (for the purposes of an example, it could start a separate thread and start producing data on that thread).

The output data could be joined on a single DataStream. For example

Input Requests: A, B
Data produced:
 A1
 B1
 A2
 A3
 B2
 ...

... and so on, with new elements being added to the DataStream forever.

How do I write a Flink Operator that can do this? Can I use e.g. FlatMapFunction?


Solution

  • It sounds you are asking about an operator that can emit one or more boundless streams of data based on a connection to an external service, after receiving subscription events. The only clean way I can see to do this is to do all the work in the SourceFunction, or in a custom Operator.

    I don't believe async i/o can emit an unbounded stream of results from a single input event. A ProcessFunction can do that, but only via its onTimer method.