stream transactional distributed-system distributedlog

what is the difference between transaction id and sequence id of a distributedlog record?

I used distributedlog AsyncLogReader to read records out of a distributedlog stream. for each log record in the stream, I found there are two sequence numbers associated with it, one is transaction id and one is sequence id. Which one should I use for tracking the read position?

Solution

Based the answer from one of the distributedlog author in the mail list:

In short, the transaction id is an application supplied sequence number. It is required to be non-decreasing. Users usually use either timestamp or offset (bytes written so far) as the transaction id, so that they can use transaction id to rewind either by time or offset.

The sequence id is the system generated sequence number. It indicates that global sequence of a log record in the stream. if you are familiar with Raft https://raft.github.io/raft.pdf, it is same as the log *index *in Raft. There are two typical use cases of sequence id. You can use sequence id to identify the number of records in between of any two records. You can use sequence id to do any sanity check on the delivery sequence.

There is also an explanation in the API page http://distributedlog.io/api/core.html#sequence-numbers