I'm studying sharding with mongodb and I have the follow structure:
I have one database named erp and 3 collections, pessoas, produtos and contatos.
So I have add my collections using:
sh.shardCollection("erp.<collection>", { id: 1 }, true)
I begin with collection pessoas, this collection have 2000 documents and are distributed this way:
mongos> db.pessoas.getShardDistribution()
Shard rs1 at rs1/desenv1:27019,desenv1:27020
data : 57KiB docs : 1497 chunks : 36
estimated data per chunk : 1KiB
estimated docs per chunk : 41
Shard rs3 at rs3/desenv1:27022,desenv1:27023
data : 19KiB docs : 503 chunks : 36
estimated data per chunk : 541B
estimated docs per chunk : 13
Totals
data : 77KiB docs : 2000 chunks : 72
Shard rs1 contains 75.27% data, 74.85% docs in cluster, avg obj size on shard : 39B
Shard rs3 contains 24.72% data, 25.15% docs in cluster, avg obj size on shard : 38B"
After this I have add the collection produtos, and I gave to her 1001 registers, so why this collection are distributed this way:
mongos> db.produtos.getShardDistribution()
Shard rs1 at rs1/desenv1:27019,desenv1:27020
data : 67KiB docs : 1001 chunks : 1
estimated data per chunk : 67KiB
estimated docs per chunk : 1001
Totals
data : 67KiB docs : 1001 chunks : 1
Shard rs1 contains 100% data, 100% docs in cluster, avg obj size on shard : 69B"
Questions:
Why only replicaSet "rs1" are getting data? The same thing happen with the collection contatos, only replicaSet "rs1" gets the data and I can't distribute the data to the other shard.
Why this happens and what I'm doing wrong?
How do I distribute equally the data? For example with 2000 registers, 1000 registers in one shard and 1000 in the another shard.
If you guys need more information just tell me.
Thanks
MongoDB balance the shards by using the number of chunks, not documents (see https://docs.mongodb.com/manual/core/sharding-balancer-administration/). Therefore, from the output you provided, the cluster is balanced. Shard rs1
contains 36 chunks, and shard rs3
also contains 36 chunks for the pessoas
collection.
If the number of documents is not balanced, that means that your inserts are going into a small number of chunks (or even a single chunk in the worst case), and not distributed across all the chunks. This typically caused by using a monotonically increasing shard key.
Please see Shard Keys for more information about this subject, and how to avoid this situation. Note that shard key selection is very important, since once a shard key is selected, it cannot be changed anymore. The only way to change the shard key of a collection is to dump the collection, and change the shard key during the restore process.