Search code examples
mongodbreplicasetmongodb-replica-setchangestream

How MongoDB detects majority in PSA architecture?


Consider I have a replica set with 3 nodes (2 data nodes and one arbiter (PSA)). When for some reason one of my data nodes goes down and I bring it back, during syncing with primary node, that is in state STARTUP2. At his time I will lose my change stream because my replica set has 2 data nodes but I don't have majority of nodes to read.

How can I handle this issue?

I also read this MongoDB doc. Is that possible to set primary node priority value higher than secondary node (that is syncing itself with primary node)? Can I have majority by doing this even when my secondary node is in STARTUP2 state?


Solution

  • There are technically two types of majority. As I called them, they're "election majority" and "data majority".

    Arbiters are supposed to help with "election majority", where it helps maintain a primary availability in a PSA architecture should the S went down. However, they're not a part of "data majority".

    "Data majority", in contrast, are both for voting and acknowledging majority-read and majority-write.

    Changestreams by design will return documents that are committed to the "data majority" of voting nodes. This is because a write that's propagated to them will not be rolled back. It will be confusing if a changestream declared that a document was written, then it rolled back, then would have to issue a "no wait, scratch that, the write didn't happen".

    Thus by its nature, arbiters are not compatible with majority-read and majority-write scenarios such as changestreams or transactions. However arbiters still has its place in a replica set, provided you know what to expect from them.

    See What is the default mongod write concern in which version? for a more complete explanation of write concerns and the effect of having arbiters.

    A secondary in STARTUP2 is not a secondary yet. It may vote in elections, but it won't acknowledge majority writes since it's still starting up.

    In terms of changestream, since in a PSA architecture the "data majority" is practically only the PS part of PSA, none of the data bearing nodes can be offline for majority reads and writes to be maintained.

    The best solution is to replace the arbiter with an actual data-bearing node. This way, you can have majority-write, majority-read, and can have one node down and still maintain majority.