I'm using MongoDB 3.2.15 with python and the pymongo driver on an Ubuntu server 14.04 64 bit operating system. The documents that I am inserting using the Bulk method are similar to the following:
{
"_id" : ObjectId("59d7d59bf1566a1f541d42d5"),
"monitor" : 5,
"tiempo" : 1,
"senial1" : {
"0" : 0.164844,
"1" : 0.325524
}
"senial2" : {
"0" : 0.569832,
"1" : 0.128563
}
}
Each Bulk inserts 100 of this type of documents. When the execution of one bulk ends, another begins. Execute 43200 bulks so I inserted in total 4320000 documents.
As I am using a cluster sharded as a test, perform the same process twice. The first time I used a Hashed Shard Key for the field id_. The second time I used a compound shard key, for the monitor and tiempo fields.
My question: Immediately finished executing the 43200 bulks I used the count () method to see if all the documents had been inserted correctly. When I used the compound shard key I got the correct result: 4320000 documents. But when I used the shard key hashed the result was 4328872 documents. After a few minutes I went back to using the count () method and this time the number of documents was correct. Why is the 'count' method counting more documents than I inserted? Why did I get this behavior for one type of key and not for the other?
thank you very much.
Note: my Cluster has 2 Shard Replica Set.
Hashed keys are more evenly distributed comparing to compound keys, especially the ones with low cardinality.
Balancer started to migrate chunks at some point, so you had same documents on both shards during the migration.
https://docs.mongodb.com/manual/sharding/#sharding-strategy and related docs have a good explanation on what's going on behind the scene.