Search code examples
apache-kafkaapache-kafka-streamsstream-processing

Kafka streams merging message


I have a data payload, which is too big for one message. Consider an avro:

record Likes {...}
record Comments {...}
record Post {
  Likes likes;
  Comments comments;
  string body;
}

Assume, likes and comments are large collections and if pass them together with post, it will exceed max message size, which I suppose incorrect to increase up to 10-20 MB.

I want to split one message into three: post body, comments and likes. However, I want database insert to be atomic - so I want to group and merge these messages in consumer memory.

Can I do it with kafka-streams? Can I have a stream without output topic (as the output message will again exceed max size).

If you have any ideas assuming the same inputs (one large message exceeding configured max message size), please share


Solution

  • Yes, you can do it with kafka-streams, merging the messaging in the datastore, and you can have a stream without output topic. You need to be sure that three parts go to the same partition (to go to the same instance of the application), so they probably will have the same key.

    You may also use three topics, for each object and then join them. (Again with the same key).

    But generally Kafka is designed to handle a lot of small messages and it does not work good with large messages. May be you should consider to send not the whole info in one message, but incremental changes, only information, which was updated.