Search code examples
apache-kafkakafka-consumer-apiapache-pulsar

What is the difference between pulsar and kafka in regards to consumption?


In order to consume data from Kafka, we can have multiple consumers on a topic, totally decoupled. Then, what is meant by no shared consumption on the page(https://streaml.io/blog/pulsar-streaming-queuing) which shares differences between kafka and pulsar?


Solution

  • In his blog, Sijie is referring to shared messaging as queuing. With queuing messaging, multiple consumers are created to receive messages from a single topic. Which consumer gets the message is completely random.

    The issue with implementing the messaging pattern with Kafka lies in way that Kafka consumers mark that they’ve consumed a message. Kafka consumers use what’s called a high watermark for consumer offsets. That means that a consumer can only say, “I’ve processed up to this point” rather than, “I’ve processed this message.”

    Consider the scenario in which multiple Kafka consumers from the same consumer group were processing from the same topic partition and one of the consumers fails due to an exception while the other succeed. Because Kafka does not a have a built-in way to only acknowledge a single message, and only uses a high-water mark, the failed message would be erronously marked as consumed when in fact it failed and needs to be either reprocessed or published to an error queue, etc.

    In order to avoid this situation, you would need to have just a single consumer per partition which limits the comsumption throughput of the topic. Which in turn requires you to increase the number of partitions in order to meet your throughput needs.

    There is a detailed explanation in this blog post