Search code examples
apache-kafkaentityrelationshipmessagingevent-sourcing

In a compacted Kafka topic, how to deal with relationships between entities?


Let's say I have a compacted Kafka topic and I populate it with entities. The topic has only 1 partition.

For this example I use Employees as the entity. An Employee may have a Superior.

These are the messages in the topic:

Message number: 1
Key: 1
Values:
    Employee id: 1
    Employee name: Joe A.
    Superior employee id: <null>

---

Message number: 2
Key: 2
Values:
    Employee id: 2
    Employee name: Frank L.
    Superior employee id: 1

---

Message number: 3
Key: 1
Values:
    Employee id: 1
    Employee name: Joe A.-F. // Name of Employee 1 has changed.
    Superior employee id: <null>

After some time, the topic is compacted. This means that for the Employee with id 1, the message number 1 is removed.

A client consuming this topic may want to build a relational model of Employees.

  • When the client now consumes the messages, it receives message number 2 as the first message.
  • Message number 2 contains a reference to a Superior with the id 1.
  • However, the Employee with id 1 does not exist yet, because message number 1 was removed. Employee 1 will only be received with the next message.

What's the best way to deal with these inconsistencies?

  • Sending all Employees in one single message in a hierarchical tree is probably not efficient, since there are many Employees.
  • When a Superior changes, should I resend all affected subordinate Employees, so that the relations can be consumed in the right order?
  • Is the consuming client responsible for dealing with inconsistent states? Does the client need to wait until the data is consistent?
  • Or, in a case such as this one, is it just not possible to use a compacted topic?

Solution

  • I think you should introduce the concept of message types; I guess the messages in your log should be something like this:

    • CreatedEmployee #1
    • EmployeeCreated #2
    • EmployeeNameChanged #1

    So, if you compact by key, then you would keep the most recent of each message type, and because of that, the first message will not be deleted.

    then the question is, do you really need compaction? Disk space is cheap, and messages are compact.