Search code examples
apache-sparkapache-nifi

Apache Spark and Nifi Integration


I want to send Nifi flowfile to Spark and do some transformations in Spark and again send the result back to Nifi so that I can to further operations in Nifi. I don't want to write the flowfile written to database or HDFS and then trigger Spark job. I want to send flowfile directly to Spark and receive the result directly from Spark to Nifi. I tried using ExecuteSparkInteractive processor in Nifi but I am stuck. Any examples would be helpful


Solution

  • You can't send data directly to spark unless it is spark streaming. If it is traditional Spark with batch execution, then Spark needs to read the data from some type of storage like HDFS. The purpose of ExecuteSparkInteractive is to trigger a Spark job to run on data that has been delivered to HDFS.

    If you want to go the streaming route then there are two options...

    1) Directly integrate NiFi with Spark streaming

    https://blogs.apache.org/nifi/entry/stream_processing_nifi_and_spark

    2) Use Kafka to integrate NiFi and Spark

    NiFi writes to a Kafka topic, Spark reads from a Kafka topic, Spark writes back to a Kafka topic, NiFi reads from a Kafka topic. This approach would probably be the best option.