Lets say I have 3 machines. Call them A, B, C. I have mongo replication turned on. So assume A is primary. B and C are secondary.
But now I would like to shard the database. But I believe all writes still go to A since A is primary. Is that true ? Can somebody explain the need of sharding and how it increases write throughput ?
I understand mongos will find the right shard. But the writes will have to be done on the primary. And rest other machines B, C will follow. Am I missing something.
I also don't understand the statement : "Each shard is a replica set". The shard is incomplete unless it is also the primary.
Sharding is completely different to replication.
I am now going to attempt to explain your confusion without making you more confused.
When sharding is taken into consideration a replica set is a replicated range of sharded information.
As such this means that every shard in the cluster is actually a replica set in itself holding a range (can hold differnt range chunks but that's another topic) of the sharded data and replicating itself across the members of that self contained replica set.
Imagine that a shard will be a primary of its own replica set with its own set of members and what not. This is what is meant by each shard being a replica set.
As such each shard will have its own self contained primary but at the same time each primary of each replica set of each shard can receive writes from a mongos if the range that the replica set holds matches the shard key target being sent down by the client.
I hope that makes sense.
This post might help: http://www.kchodorow.com/blog/2010/08/09/sharding-and-replica-sets-illustrated/
Does that apply here ? Say I want writes acknowledged from 2 mongod instances. Is that possible ? I still have trouble understanding. Can you please explain with an example ?
Ok let's take an example. You have 3 shards which are in fact 3 replica sets, called rs1 and rs2 and rs3, consisting of 3 nodes each (1 primary and 2 secondaries). If you want a write to be acknowledged by 2 of the members of the replica set then you can do that as you normally would:
db.users.insert({username:'sammaye'},{w:2})
Where username
is the shard key.
The mongos
this query is sent to will seek out the correct shard (node) which is in fact a replica set, connect to that replica set and then perform the insert as normal.
So as an example, rs2 actually holds the range of all usernames starting with the letters m-s
. The mongos will use its internal mapping of rs2 (gotten when connecting to the replica set) along with your write/read concern to judge which members to read from and write to.
Tags will apply all the same too.
If mongos finds a shard, which is not "primary"(as in replication) is the write still performed in the secondary ?
You are still confusing yourself here, there is no "primary" like in replication, there is a seed (sometimes called a master) server for the sharded database but that is a different thing entirely. The mongos does not have to write to the master node in anyway, it is free to write to any part of the sharded cluster that your queries allow it.