Search code examples
apache-kafkastreaming

What are the advantages and disadvantages of using several Kafka topics compared to using a big Kafka topic?


I cannot find literature on this, just found a couple of articles that doesn't make it any clearer.

For example, let's say I want to send tweets in streaming, I could create a big kafka topic called 'tweets' and use it to send all the tweets, but it could also be possible to create several smaller kafka topics for the different subjects that the tweet is about: dogs, cats, horses, etc. (imagine that these subjects are relevant for the project).

What would be the advantages and disadvantages of using several smaller Kafka topics instead of using a general Kafka topic?


Solution

  • imagine that these subjects are relevant for the project

    That is the most important determination. Otherwise, you end up with millions of topics for every possible word, and multiplying that by different language support? That will not scale.

    There is a middle-ground, too - routing specific messages to certain partitions.


    The only real deciding factor is that keyed records should end-up in the same partition to be ordered.

    There is also an upper-bound on the number of topics that a Kafka cluster can reasonably support.
    The upper-limit on number of partitions for one topic is generally considered higher than the topics the cluster can store.