For me it's seems like spring kafka's suffix mechanism would avoid the proper use of a transactional.id
in a kafka application.
As far as i know the transactional.id
has quite some requirements for proper usage by kafka. Which are hard to explain (especially for all cases), so i will concentrate on the case where a "read / process / write" in an "exactly once semantic" way is used.
I think i should explain this case as an example in short here, so we are on the same page; also it's quite complex and maybe i have a flaw in my understanding.
In general some process is reading from some partition P0 of topic T0 the payload M0. Then it processes some data and creates a result F(M0) and writes it to another topic T1.
With transaction it would work like. Registering a transactional.id
to the transaction coordinator.
transactional.id
.If the producer dies in a not graceful way, it could have produced some commit which has the state open transaction
with the transaction id of this producer.
If a new producer comes up with the same transactional.id
it would be able to take over and manage the open transactions (either by succeed or abort).
But if spring kafka would add a running number as suffix per created producer (for each template call there will be a new created one (if not taken from cache)). Then a restarted application could have a different transactional.id
even if it's the same application and is using the same input topic partition.
Like original Transaction used was T0.P0 10
(where T0.P0
is the given prefix and 10
was the running number postfix). And the started application uses T0.P0 1
.
Do i miss something here? What's the purpose of this suffix?
(I put this as a question here because i'm not sure if this is really a bug in spring kafka (and i know they prefer to have such discussions on stackoverflow and not (yet) as a ticket)
sources:
We have to suffix the transactional.id
because you can only have one producer with that id. In a multi-threaded environment, we would have to single-thread all transactions, which would defeat the purpose.
...it would be able to take over and manage the open transactions (either by succeed or abort).
That is not correct; the failed transaction will always be aborted; and new producer with the same transactional.id
would simply fence the old one.
The transaction will timeout if no new producer appears with the same id before that time.
EDIT
See KIP-447 - since EOS Mode V2 (previously BETA), producer fencing is now based on consumer metadata instead of just the transactional.id
.