I am writing an application which reads MySQL bin logs and pushes changes into a Kinesis stream. My use case requires perfect ordering of mysql events in the kinesis stream for which I am using the putrecord operation instead of putrecords and also including the 'SequenceNumberForOrdering' key. But one point of failure still remains i.e. the retry logic. Being an async function (using js sdk of aws), how can i ensure order in case of failure during the write operation to kinesis.
Is blocking write (blocking the event loop till the callback is received for the put record) too bad a solution? Or is there a better way?
Rather than try to enforce ordering when adding records to the stream, order the records when you read them. In your use case, every binlog entry has a unique file sequence, starting position, and ending position. So it is trivial to order them and identify any gaps.
If you do find gaps when reading, the consumers will have to wait until they're filled. However, assuming no catastrophic failures, all records should be close to each other in the stream, so the amount of buffering should be minimal.
By enforcing ordering on the producer side, you are limiting your overall throughput to how fast you can write individual records. If you can keep up with the actual database changes, then that's OK. But if you can't keep up you'll have ever-increasing lag in the pipeline, even though the consumers may be lightly loaded.
Moreover, you can only enforce order within a single shard, so if your producer ever needs to ingest more than 1 MB/second (or > 1,000 records/second) you are out of luck (and in my experience, the only way you'd reach 1,000 records/second is via PutRecords
; if you're writing a single record at a time, you'll get around 20-30 requests/second).