Difference between resumeAfter and startAtOperationTime in MongoDB change streams

With version 4, MongoDB change streams can use two distinct parameters for specifying where to recover the change stream: resumeAfter (some internal token) and startAtOperationTime, a timestamp type.

Is it possible to completely replace resumeAfter with startAtOperationTime for a safe recovery of change streams by using the clusterTime found in every change event?

What I am particularly concerned about and where I couldn't find exact information in the documentation is whether for startAtOperationTime same rules and guarantees apply for what can be resumed and for how long. Is the operation time used here persisted correctly and can it always be used as a replacement for the document token usually used for resumeAfter?

Solution

Is the operation time used here persisted correctly and can it always be used as a replacement for the document token usually used for resumeAfter?

Which of the two to use, depends on your use case.

The two options, resumeAfter and startAtOperationTime, are quite similar with subtle differences:

startAtOperationTime takes a timestamp. While resumeAfter takes the entire _id of a Change Stream event document.
startAtOperationTime can resume notifications after an invalidate event by creating a new change stream. While resumeAfter unable to resume a change stream after an invalidate event closes the stream.
startAtOperationTime resumes changes that occurred at or after the specified timestamp. While resumeAfter resumes changes immediately after the provided token.

Whichever one you choose, either token or timestamp should be within the Replica Set Oplog window time. Change stream relies on MongoDB global logical clock (cluster time) which is sync'd with the distributed oplog, so either options are using the same underlying technology.

Worth noting if you would like to start watching a collection and processing existing entries within the collection, you can specify startAtOperationTime with a constructed timestamp. It would be harder to do this with resumeAfter, as it requires a token that originates from _id of an event.

Also, new in MongoDB v4.2 there is a new option startAfter which takes an _id from an event, and resumes a change stream after the operation specified in the resume token. In addition, it allows notifications to resume after an invalidate event much like startAtOperationTime.

You may also find the compatibility table between resume tokens on MongoDB versions useful