MongoDb's Sharding does not improve application in a Lab setup

I'm currently developing a mobile application powered by a Mongo database, and however everything is working fine right now, we want to add Sharding to be prepared for the future.

In order to test this, we've created a lab environment (running in Hyper-V) to test the various scenario's:

The following servers have been created:

Ubuntu Server 14.04.3 Non-Sharding (Database Server) (256 MB Ram / Limit to 10% CPU).
Ubuntu Server 14.04.3 Sharding (Configuration Server) (256 MB Ram / Limit to 10% CPU).
Ubuntu Server 14.04.3 Sharding (Query Router Server) (256 MB Ram / Limit to 10% CPU).
Ubuntu Server 14.04.3 Sharding (Database Server 01) (256 MB Ram / Limit to 10% CPU).
Ubuntu Server 14.04.3 Sharding (Database Server 02) (256 MB Ram / Limit to 10% CPU).

A small console application have been created in C# to be able to measure the time to perform an insert.

This console application does import 10.000 persons with the following properties: - Name - Firstname - Full Name - Date Of Birth - Id

All 10.000 records differs only by '_id', all the other fields are the same for all the records.

It's important to note that every test is exactely run 3 times. After every test, the database is removed so the system is clean again.

Find the results of the test below:

Insert 10.000 records without sharding

Writing 10.000 records | Non-Sharding environment - Full Disk IO #1: 14 Seconds.
Writing 10.000 records | Non-Sharding environment - Full Disk IO #2: 14 Seconds.
Writing 10.000 records | Non-Sharding environment - Full Disk IO #3: 12 Seconds.

Insert 10.000 records with single database shard

Note: Sharding key has been set to hashed _id field. See Json below for (partial) sharding information:

shards:
  {  "_id" : "shard0000",  "host" : "192.168.137.12:27017" }

databases:
  {  "_id" : "DemoDatabase",  "primary" : "shard0000",  "partitioned" : true }
          DemoDatabase.persons
                  shard key: { "_id" : "hashed" }
                  unique: false
                  balancing: true
                  chunks:
                          shard0000       2
                  { "_id" : { "$minKey" : 1 } } -->> { "_id" : NumberLong(0) } on : shard0000 Timestamp(1, 1)
                  { "_id" : NumberLong(0) } -->> { "_id" : { "$maxKey" : 1 } } on : shard0000 Timestamp(1, 2)

Results:

Writing 10.000 records | Single Sharding environment - Full Disk IO #1: 1 Minute, 59 Seconds.
Writing 10.000 records | Single Sharding environment - Full Disk IO #2: 1 Minute, 51 Seconds.
Writing 10.000 records | Single Sharding environment - Full Disk IO #3: 1 Minute, 52 Seconds.

Insert 10.000 records with double database shard

Note: Sharding key has been set to hashed _id field. See Json below for (partial) sharding information:

shards:
  {  "_id" : "shard0000",  "host" : "192.168.137.12:27017" }
  {  "_id" : "shard0001",  "host" : "192.168.137.13:27017" }

databases:
  {  "_id" : "DemoDatabase",  "primary" : "shard0000",  "partitioned" : true }
          DemoDatabase.persons
                  shard key: { "_id" : "hashed" }
                  unique: false
                  balancing: true
                  chunks:
                          shard0000       2
                  { "_id" : { "$minKey" : 1 } } -->> { "_id" : NumberLong("-4611686018427387902") } on : shard0000 Timestamp(2, 2)
                  { "_id" : NumberLong("-4611686018427387902") } -->> { "_id" : NumberLong(0) } on : shard0000 Timestamp(2, 3)
                  { "_id" : NumberLong(0) } -->> { "_id" : NumberLong("4611686018427387902") } on : shard0001 Timestamp(2, 4)
                  { "_id" : NumberLong("4611686018427387902") } -->> { "_id" : { "$maxKey" : 1 } } on : shard0001 Timestamp(2, 5)

Results:

Writing 10.000 records | Single Sharding environment - Full Disk IO #1: 49 Seconds.
Writing 10.000 records | Single Sharding environment - Full Disk IO #2: 53 Seconds.
Writing 10.000 records | Single Sharding environment - Full Disk IO #3: 54 Seconds.

According to the tests executed above, sharding does work, the more shards that I add, the better the performance. However, I don't understand why I'm facing such a huge performance drop when working with shards rather than using a single server.

I need to blazing fast reading and writing s I tought that sharding would be the solution, but it seems that I'm missing something here.

Anyone why can point me in the right direction?

Kind regards

Solution

The layers between the routing server and config server, routing server and data nodes add latency. If you have 1ms ping * 10k inserts, you have an 10 seconds of latency that does not appear in the unsharded setup.

Depending on your configured level of write-concern (if you configured any level of write-acknowledgement), you could have an additional 10 seconds to your benchmarks on the sharded environment due to blocking until an acknowledgement is received from the data node.

If your write-concern is set to acknowledge and you have replica nodes, then you also have to wait for the write to propagate to your replica nodes, adding additional network latency. (You don't appear to have replica nodes though). And depending on your network topology, write-concern could add multiple layers of network latency if you use the default setting to allow chained replication (secondaries sync from other secondaries). https://docs.mongodb.org/manual/tutorial/manage-chained-replication/. If you have additional indexes and write concern, each replica node will have to write that index before returning a write-acknowledgement (it is possible to disable indexes on replica nodes though)

Without sharding and without replication (but with write-acknowledgement), while your inserts would still block on the insert, there is no additional latency due to the network layer.

Hashing the _id field also has a cost that accumulates to maybe a few seconds total for 10k. You can use an _id field with a high degree of randomness to avoid hashing, but I don't think this affects performance much.