Search code examples
mongodbiotshardingdistributed-systembigdata

picking a shardkey for mongodb


I want to shard my MongoDB database. I have a high insert rate and want to distribute my documents on two shards evenly.

I have considered rangebase sharding, because I have range queries; but I can not find a solution for picking a good shard key.

{
    Timestamp : ISODate("2016-10-02T00:01:00.000Z"),
    Machine_ID: "100",
    Temperature:"50"
}

If this is my document and I have 100,000 different machines, would the Machine_ID be a suitable shardkey? And if so, how will MongoDB distribute it on the shards, i.e. do i have to specify the shard range myself? like put Machine_ID 0-49,999 on shard A, and 50,000-100,000 on shard B?


Solution

  • I think the Machine_ID would be a suitable shard key if your queries afterwards will be per Machine, i.e. get all the temperatures for a specific machine for a certain time range. Reading more about shard keys can be found here: Choosing shard key

    MongoDB has two kinds of sharding: Hashed sharding and Range sharding which you can read more about here: Sharding strategies. Having said that, you don't need to specify the range of the shards yourself, mongo will take care of it. Especially when a time comes when you'll need to add a new shard, mongo will rearrange the chunks into the new shard.