Search code examples
hadoophadoop-yarnspring-cloud-dataflow

How to deploy autonomous application with Spring DataFlow?


My application is configured to read a topic from a configured Kafka, then write the transformed result in the Hadoop HDFS. In order to do so, it needs to be launched on a Yarn cluster node.

In order to do so, we'd like to use Spring DataFlow. But since this application doesn't need any input from another flow (it already knows where to pull its source), and outputs nothing, how can I create a valid DataFlow stream from it ? In other words, this would be a stream composed of only one app, that should run indefinitely on a Yarn Node.


Solution

  • In this case you need a stream definition that connects to a named destination in Kafka and write to HDFS.

    For instance, the stream would look like this:

    stream create a1 --definition ":myKafkaTopic > hdfs"

    You can read here for more info on this.