Search code examples
amazon-web-servicesamazon-sqsapache-kafka

Kafka or SNS or something else?


Sorry if it is a newbie question. But I'm trying to understand what should I use. As far as I understand Kafka is :

Apache Kafka is a distributed publish-subscribe messaging system.

And SNS is also pub/sub system.

My goal is to use some queue messaging system on AWS with application that will be distributed over few servers (By the way the main language is Python). And because it is on amazon, my first thought was to use SNS and SQS. But then I saw a lot of people using Kafka on AWS. What are the advantages of one over another?


Solution

  • The use-cases for Kafka and Amazon SQS/Amazon SNS are quite different.

    Kafka, as you wrote, is a distributed publish-subscribe system. It is designed for very high throughput, processing thousands of messages per second. Of course you need to setup and cluster it for yourself. It supports multiple readers, which may "catch up" with the stream of messages at any point (well, as long as the messages are still on disk). You can use it both as a queue (using consumer groups) and as a topic.

    An important characteristic is that you cannot selectively acknowledge messages as "processed"; the only option is acknowledging all messages up to a certain offset.

    SQS/SNS on the other hand:

    • no setup/no maintenance
    • either a queue (SQS) or a topic (SNS)
    • various limitations (on size, how long a message lives, etc)
    • limited throughput: you can do batch and concurrent requests, but still achieving high throughputs would be expensive
    • Regarding duplication of messages: SQS Standard queue guarantees at least-once delivery, however you can avoid retrieval of duplicate messages by setting visibility timeout on the queue. For SQS FIFO queue, message to be received is exactly-once and if you want to control sending same message again, you can do it using the deduplication id and the default time is 5 minutes before sending another one. SNS is designed to send same message across multiple consumers and it is similar to SQS Standard in case of replication of messages.
    • SNS and SQS have the option to send failed messages to dead letter queues out of the box and you can redrive the messages out of the queue when ready.
    • SNS has notifications for email, SMS, SQS, HTTP built-in. With Kafka, you would probably have to code it yourself
    • no "message stream" concept

    So overall I would say SQS/SNS are well suited for simpler tasks and workloads with a lower volume of messages.