Search code examples
mongodbreplicationsharding

MongoDB Sharding + Replication


I am new to MongoDB and I am trying to understand how these two technologies work together:

When using replication for you database, you have a primary node and a bunch of secondaries. To ensure consistency, it's recommended for you to always read from the primary node, right?

So when you use replication with sharding for exemple: You have 2 replicas r1 and r2 in different servers, the partition is made by an id from 1 to 250 and 2 shards, shard 1 with 1 - 125 and shard 2 with 126 - 250.

Now my questions: When using partitioning with sharding it means now that every shard have its own primary node? So when reading information from document with id 130 I have to first find out where the primary node from the shard 2 is located?

For example: r1 have the primary node for 1-125 and a secondary for 126-250

r2 have the primary node for 126-250 and secodnary for 1-125

Is that correct?

Every replica still keeps the full database information?

Best regards


Solution

  • When using replication for you database, you have a primary node and a bunch of secondaries. To ensure consistency, it's recommended for you to always read from the primary node, right?

    Answers is yes and no. Yes is you normally read from primary node but if you read from secondary It is a bit latency but result is nearly the same reading from primary

    No is You no need to check where is primary node to read, just specify replicaset in connect string and forget about replicaset. Just work with this just like single db

    Now my questions: When using partitioning with sharding it means now that every shard have its own primary node?

    Yes

    So when reading information from document with id 130 I have to first find out where the primary node from the shard 2 is located?

    No, when connect to cluster you should connect via mongos https://docs.mongodb.com/manual/reference/program/mongos/ It will do everything for you from finding which shard contain your data, primary node ., etc. With mongos you work with cluster just like a single db.

    The only thing you should care is about performance you should read and understand about shard collection and shard key https://docs.mongodb.com/manual/core/sharding-shard-key/

    For example: r1 have the primary node for 1-125 and a secondary for 126-250. r2 have the primary node for 126-250 and secodnary for 1-125. Is that correct?

    -> Wrong, Data is separated by shard key, read above for detail. In this case If you use id (1 - 250) for shard key.

    • r1 will contain 1- 125 in both primary and secondary (secondary is backup for primary what primary has will be cloned to secondary)
    • r2 will contain 126 - 250 in both primary and secondary too ( for detail r2 primary contain 126 - 250, r2 secondary contain 126 - 250 too. Secondary node is mirror of primary node)

    Every replica still keeps the full database information?

    No, only primary shard contain full database information (https://docs.mongodb.com/manual/core/sharded-cluster-shards/#primary-shard) Every replica set contain a part of shard collection that defined by shard key.Shard collection is big table you want to separate on several machine to improve performance