Let's say I have a compacted Kafka topic and I populate it with entities. The topic has only 1 partition.
For this example I use Employees as the entity. An Employee may have a Superior.
These are the messages in the topic:
Message number: 1
Key: 1
Values:
Employee id: 1
Employee name: Joe A.
Superior employee id: <null>
---
Message number: 2
Key: 2
Values:
Employee id: 2
Employee name: Frank L.
Superior employee id: 1
---
Message number: 3
Key: 1
Values:
Employee id: 1
Employee name: Joe A.-F. // Name of Employee 1 has changed.
Superior employee id: <null>
After some time, the topic is compacted. This means that for the Employee with id 1, the message number 1 is removed.
A client consuming this topic may want to build a relational model of Employees.
What's the best way to deal with these inconsistencies?
I think you should introduce the concept of message types; I guess the messages in your log should be something like this:
So, if you compact by key, then you would keep the most recent of each message type, and because of that, the first message will not be deleted.
then the question is, do you really need compaction? Disk space is cheap, and messages are compact.