Search code examples
apache-kafkakafka-producer-apimessagebroker

How does partitioning in message brokers solves ordering problem?


I got the idea of partitioning in general, but I can't realize how it indeed solves the ordering problem. Taking the Chris Richardson's book example if I have 3 events about a given Order with "shard-key" 1 (Order Created, Order Updated and Order Cancelled). If there's more than one instance per partition how can I ensure the events were processed in order? It's not a downsizing for the same problem?

I mean, in that example all messages goes to the first shard, but won't they round-robin between both instances?


Solution

  • If your records have a key, the default behaviour is for any given key to always be sent to the same partition.

    Partitioning is a divide-and-conquer approach, but comes with some sacrifices that may be quite acceptable in any given problem domain. A topic with multiple partitions has no notion of 'order'; as you point out you can have multiple competing consumers which may run at different speeds.

    Instead, each partition will only ever be assigned to one consumer in a consumer-group, and it is at this level that ordering is strict(ish). I say strict-ish because things can always go wrong, and records could be reprocessed so your ordering is never absolutely guaranteed by out-of-the-box Kafka.

    When you say you need to process things in order, you my need to think how important this is. e.g. you could argue that a bank account's transactions should be processed in order (maybe), so all records for a specific account should be on the same partition, but the relative ordering of two different accounts' activities are unimportant.

    With respect to partitioning strategy, up to V2.3, messages without a key will be sent to partitions in round-robin fashion. From v2.4 onwards, KIP-480 introduced a sticky partitioner to round-robin batches of records, instead of strictly one-at-a-time.