Search code examples
apache-sparkspark-streamingapache-kafka-streamsspark-structured-streaming

Are Spark Streaming, Structured Streaming and Kafka Streaming the same thing?


I have come across three popular streaming techniques that are Spark Streaming, Structured Streaming and Kafka Streaming. I have gone through various sites but not getting this answer, are these three the same thing or different? If not same what is the basic difference. I am not looking for an in depth answer. But an answer to above question (yes or no) and a little intro to each of them so that I can explore more. :)

Thanks in advance Subrat


Solution

  • I guess you are referring to Kafka Streams when you say "Kafka Streaming".

    Kafka Streams is a JVM library, part of Apache Kafka. It is a way of processing data in Kafka topics providing an abstraction layer. Applications running KafkaStreams library can be run anywhere (not just in the Kafka cluster, actually, it is not recommended to). They'll consume, process and produce data to/from the Kafka cluster.

    Spark Streaming is a part of Apache Spark distributed data processing library, that provides Stream (as oppposed to batch) processing. Spark initially provided batch computation only, so a specific layer Spark Streaming was provided for stream processing. Spark Streaming can be fed with Kafka data, but it can be connected to other sources as well.

    Structured Streaming, within the realm of Apache Spark, is a different approach that came to overcome certain limitations to stream processing of the previous approach that Spark Streaming was using. It was added to Spark from a certain version onwards(2.0 IIRC).