Search code examples
mongodbelasticsearchdistributedraft

How to ensure consistent reading in distributed system?


In a distributed system, if only half of the nodes are successfully written, the subsequent nodes that read the unwritten data will be inconsistent. How to avoid this situation?

client write --> Node1  v
             --> Node2  v
client read  --> Node3  x(The latest data was not read)

My plan:

  • Compare the data version with other nodes when reading data
  • If the current node version is found to be lower, it will be routed to other nodes to read data.

Solution

  • I saw both mongodb and elasticsearch is being tagged, I don't know which case you are thinking, but the two database is very different.

    For mongo, replicas are not by default used to increase reading speed, see https://docs.mongodb.com/manual/core/read-preference, the default reading preferences will only look at primary and excludes all replicas. The writing of Mongo is also to the primary first and the replication will happen asynchronously possibly after the write to primary finishes, see https://docs.mongodb.com/manual/core/replica-set-members/. Because of that, if you do a force read to the secondary, you are not guaranteed to have the newest data.

    For elasticsearch, elasticsearch naturally does not guarantee you always read the most recent data, see https://www.elastic.co/guide/en/elasticsearch/reference/current/near-real-time.html, so in either way even if there is only one node you may get data that are out of date.