Search code examples
mongodbsharding

MongoDB sharding not balancing


We are reaching the limits of a standard replica set, and we are testing the migration to a Sharded server.

I have created a fresh new sharded cluster on 6.0.3 with 3 shards (each shard is 2 data + 1 arbiter).

I have restored a sample collection of 92 Go (about 10 millions documents).

I have successfully created indexes and sharded the collection :

sh.shardCollection(
    "saba_ludu.MyCollection", 
    { UniqueId:" hashed" }, 
    {
        collation: {locale : "simple"}
    }
)

After that, the shard was not balancing at all ; all the data was fully on the primary shard. The following command was returning balancerCompliant as true.

sh.balancerCollectionStatus("saba_ludu.MyCollection")

First thing odd, I encountered a command returning an error saying the command was not available because of the compatibility version of the cluster was too low (I never made a configuration like this on the cluster...). I ran the command to move to compatibility version 6, and right after that the collection started to balance across the shards and creates a lot of chunks.

But I h am facing an another issue: the primary shard has not created chunks. There's still only one chunk.

db.getSiblingDB("saba_ludu").MyCollection.getShardDistribution();

Shard i2a-poc-mgdb-cl-03 at i2a-poc-mgdb-cl-03/i2a-poc-mgdb-cl-03-0.i2a-poc-mgdb-cl-03-svc.i2a-poc.svc.cluster.local:27017,i2a-poc-mgdb-cl-03-1.i2a-poc-mgdb-cl-03-svc.i2a-poc.svc.cluster.local:27017
{
  data: '31.14GiB',
  docs: 3372644,
  chunks: 1,
  'estimated data per chunk': '31.14GiB',
  'estimated docs per chunk': 3372644
}
---
Shard i2a-poc-mgdb-cl-02 at i2a-poc-mgdb-cl-02/i2a-poc-mgdb-cl-02-0.i2a-poc-mgdb-cl-02-svc.i2a-poc.svc.cluster.local:27017,i2a-poc-mgdb-cl-02-1.i2a-poc-mgdb-cl-02-svc.i2a-poc.svc.cluster.local:27017
{
  data: '30.87GiB',
  docs: 3344801,
  chunks: 247,
  'estimated data per chunk': '127.99MiB',
  'estimated docs per chunk': 13541
}
---
Shard i2a-poc-mgdb-cl-01 at i2a-poc-mgdb-cl-01/i2a-poc-mgdb-cl-01-0.i2a-poc-mgdb-cl-01-svc.i2a-poc.svc.cluster.local:27017,i2a-poc-mgdb-cl-01-1.i2a-poc-mgdb-cl-01-svc.i2a-poc.svc.cluster.local:27017
{
  data: '30.86GiB',
  docs: 3344803,
  chunks: 247,
  'estimated data per chunk': '127.94MiB',
  'estimated docs per chunk': 13541
}
---
Totals
{
  data: '3.114100851894496e+23GiB',
  docs: 10062248,
  chunks: 495,
  'Shard i2a-poc-mgdb-cl-03': [
    '0 % data',
    '33.51 % docs in cluster',
    '9KiB avg obj size on shard'
  ],
  'Shard i2a-poc-mgdb-cl-02': [
    '0 % data',
    '33.24 % docs in cluster',
    '9KiB avg obj size on shard'
  ],
  'Shard i2a-poc-mgdb-cl-01': [
    '0 % data',
    '33.24 % docs in cluster',
    '9KiB avg obj size on shard'
  ]
}

Does anybody knows why I was facing the first compatibility version issue, or why I not able to balance the primary shard ?

Thanks


Solution

  • In MongoDB 6 the documents are not balanced to have even number of chunks (as it was in releases before 6.0), the target is to have even distribution of data. Each of your shard has almost exactly 1/3 (i.e. around 33%) of all data, so it seems to be correct. Also the number of document (around 3.3 millions) is similar in all shards.

    Chunks are used to distribute the data across shards. When you enable sharding on existing data, then you have initially just one single chunk. The balancer starts to split the chunk and moves the new chunk to another shard. Once all data is evenly distributed, there is no reason anymore to split the inital chunk, so on your primary shard you get only one big chunk.

    I guess, by time the number of chunks will also increase on the primary shard.