Search code examples
apache-kafkaapache-kafka-streams

Kafka - Stream vs Topic


What is the difference between Kafka topic and stream? I was thinking both were same. This doc says that create stream from a topic which caused the confusion.

https://docs.ksqldb.io/en/latest/developer-guide/create-a-stream/

Questions:

  1. What is the difference between Kafka topic and stream?
  2. When topic gives us the stream of events, what is the need for us to create stream from a topic?
  3. Can we create table from topic directly? Or should we create stream first to create table?

Solution

    1. A topic is a collection of partitions where each partition will contain some messages. A partition is actually a directory on the disk.
    1. What is the difference between Kafka topic and stream?

    A. A stream is a flow of data, whether it is from a single topic or collection of topics. There is also a method with stream(Collection<String> topics) which means that a stream is not just confined to a single topic.

    1. When topic gives us the stream of events, what is the need for us to create stream from a topic?

    A. Stream is the basic entity in Kafka streams. A stream goes through a set of processors. The term stream is used in the context of Kafka streams. Kafka streams internally creates a consumer which consumes the topic(s).

    Again, as said earlier, a stream can also be a collection of topics. So, sometimes if you want to consume different topics and process them, then you need to create a stream for those topics.

    1. Can we create table from topic directly? Or should we create stream first to create table?

    A. Yes, it is possible to create a table from a topic directly using both the Kafka clients API as well as the Kafka streams API.

    If you are using Kafka streams in your application, then you can use StreamsBuilder#table() or StreamsBuilder#globalTable() methods.

    If you are using Kafka clients API, then you have to manually consume the topic and populate the messages in a map or in some other data structure.

    Kafka streams is used when there are topologies. For simple applications, where we just consume, process and commit without multiple process stages, then Kafka clients API should be good enough. Whatever that can be achieved through Kafka streams can be achieved through Kafka clients also.

    Kafka streams basically makes things relatively simple for complex workflows, but it can also be used for simple workflows.