Search code examples
mongodbmongo-shell

Is there a way to shard a MongoDB collection after loading data in each shard?


I have a 3-shard MongoDB cluster (version 4.4) that I want to move to another cluster. Each shard has 500 million documents for collection "A". I am trying to speed up the process by using mongodump / mongorestore per shard and moving data in parallel for all shards at once.

After moving the data, the destination cluster already has data in all 3 shards. Is there a way to update or start the sharding so mongos already recognizes data in all shards?

I tried the following command: https://www.mongodb.com/docs/manual/reference/command/shardCollection/

but only the 1st shard (rs0) was successful for collection "A".

This is the status output:

'Migration Results for the last 24 hours': {
    '44': "Failed with error 'aborted', from rs0 to rs1",
    '68': "Failed with error 'aborted', from rs0 to rs2",
    '682': 'Success'
  }

I have enabled sharding and created the sharding key.

I don't know if this is even possible, because usually the shard is done in a collection and it automatically splits its data between available shards based on the sharding key.


Solution

  • The most suitable options are as follow (depending on if you want to backup/restore sharded cluster or only sharded collection):

    1. backup/restore mongoDB sharded cluster via file system snapshot offcial procedure is here.

    2. backup/restore mongoDB sharded collection via mongodump/mongorestore (this option is best if you have other collections in the targeted cluster and the collection is relatively small , bigger collections can take time ).

    2.1 Create mongodump from mongos for the whole collection ( if the collection is big may take some time as it will need to read all the 500M docs from the 3x shards )

    2.2 Create the collection , create the necesary indexes , shard it and pre-split in the target cluster so it is empty , but has the necessary number of chunks in all 3x shards before the loading.

    2.3. mongorestore the collection via mongos to the targeted cluster.