Search code examples
mongodbcluster-computingsharding

MongoDB sharded cluster. How to distribute the disk storage between the nodes?


We successfully created a working MongoDB cluster with the following setup:

  • 2x MongOS Routers
  • 2x Replicated Config nodes (metadata)
  • 6x Non Replicated Shard nodes (no replica intentionally)

All separate instances are identical in terms of CPU, MEM, disk space, etc.

The cluster itself seems to work great. We are using these sharding commands in order to distribute the shards:

sh.shardCollection("blockchains.ethereum_balance",{"dapp_id":"hashed"})
sh.shardCollection("blockchains.ethereum_daily",{"to":"hashed"})

etc.

However, the storage distributions seem not equal or at least not efficient:

mongodb_shards_disk_usage

Questions:

  1. If we add a new shard, does the new node get some information from "older" nodes? (The "old" shards are moving partial data)
  2. How to manage the storage distribution in this case?

Any ideas appreciated.

EDIT:

It seems that newly created shards have just a portion of some collections. Automatic migration takes time.

In addition, indexes are kept only per one shard only. In order Mongo to move some chunks to newly created shards indexes should be in target shards too.


Solution

  • In a sharded cluster you have the Sharded Cluster Balancer which takes care about distributing the data evenly. By default you don't have to do anything manually.

    Check the sharding status with sh.status() and db.ethereum_balance.getShardDistribution() / db.ethereum_daily.getShardDistribution()

    Perhaps you selected a poor Shard Key, see Choosing a Shard Key. Do you have a big unsharded collection in your database?