Search code examples
apache-kafkaapache-kafka-streams

What exactly is data record in Kafka Streams?


So I've read enough tutorials and official documentation, but everything that I've found on data record is pretty much copy-pasting from one source to another:

  1. A stream partition is an, ordered, replayable, and fault-tolerant sequence of immutable data records, where a data record is defined as a key-value pair.

  2. Each stream partition is a totally ordered sequence of data records and maps to a Kafka topic partition. A data record in the stream maps to a Kafka message from that topic.

So what exactly is data record? Since it maps kafka message is it safe to say that it is pretty much the same thing or is it sort of another object that has some sort of information regardig kafka message?


Solution

  • A data record is nothing but a message which is structured as a key-value pair like name=smith or id=101.

    Stream is a high-level term used in the context of Kafka-streams and Kafka streams is a high-level API built on top of the core kafka-clients API to provide some additional functionality.

    Generally, a stream is a flow of data, in this case it is a collection of messages or data-records.

    So, when you say data record, it means a Kafka message only and it is not some other object that has some information (or metadata) about Kafka message. If you want to store that some other information termed as metadata, it is usually stored in headers of the Kafka message/data-record.