We are starting to use Kafka in a backend redevelopment, and have a quick question about how to structure the messages that we produce and consume.
Imagine we have a user microservice that handles CRUD operations on users. The two structures that have been put forward as a possibility are:
1) Four kafka topics, one for each operation. The message value would just contain the data needed to perform the operation, i.e.
topic: user_created
message value: {
firstName: 'john'
surname: 'smith'
}
topic: user_deleted
message value: c73035d0-6dea-46d2-91b8-d557d708eeb1 // A UUID
and so on
2) A single topic for user related events, with a property on the message describing the action to be taken, as well as the data needed, i.e.
// User created
topic: user_events
message value: {
type: 'user_created',
payload: {
firstName: 'john'
surname: 'smith'
}
}
// User deleted
topic: user_events
message value: {
type: 'user_deleted',
payload: c73035d0-6dea-46d2-91b8-d557d708eeb1 // A UUID
}
I am in favour of the first system described, although my inexperience with Kafka renders me unable to argue strongly why. We would greatly value any input from more experienced users.
I worked on this kind of architecture recently.
We used an API Gateway, Which was the Webservice that communicates with our front end (ReactJS in our case). This API gateway used REST protocol. That microservice, developed with Spring Boot, acts as a producer and consumer in a separate thread.
1- Producing Message: Send message to Kafka broker on topic "producer_topic"
2- Consuming Message: Listen to the incoming messages from Kafka on topic "consumer_topic"
For consuming there was a pool of threads that handled the incoming messages and execution service which listen to Kafka stream and send assign the message handling to a thread from the pool.
Bottom to that there was a DAO Microservice that handle kafka messages and did the CRUD stuff.
Messages format looked really like your second approach.
//content of messages in the consumer_topic
{
event_type: 'delete'
message: {
first_name: 'John Doe'
user_id: 'c73035d0-6dea-46d2-91b8-d557d708eeb1'
}
}
This is why I should recommend you the second approach. There is less complexity as you handle all crud operations with only one topic. It's really fast due to partitions parallelism and you can add replication for being more fault tolerant.
The first approach sounds good in term of dematerialization and separation of concerns, but it's not really scalable. For instance let's say you want to add additional operation, it's one more topic to add. Also look at the replication. You will have more replicas to do and that's pretty bad I think.