Search code examples
mongodbsharding

How to build field hash based sharding in MongoDB


I'm looking for a good approach to do the following :

Given a document with some field F I want to set up sharding to that my application can generate a static hash for that field value (meaning, the hash will always be the same if the value is the same) and then use that hash to target the appropriate shard in a normal MongoDB sharded setup.

Questions :

  1. Is this a safe/good approach?
  2. What is a good way to go about implementing it
  3. Are there any gotchas concerning shard cluster setup that I should be aware of.

Thanks!


Solution

  • I've actually implemented this and it's very doable and results in very good write performance. I'm assuming you have the same reasons for implementing it as I did (instant shard targeting without warmup/balancing, write throughput, no performance degradation during chunk moves/splits, etc.).

    Your questions :

    1. Yes it is provided you implement it correctly.
    2. What I do in our in-house ORM layer is mark certain fields in document as a hash sharded field. Our ORM will then automatically generate a hash value for that field value just prior to writing or reading the document. Outgoing queries are then decorated with that hash value (in our case always called "hash") which MongoDB sharding then uses for shard targeting. Obviously in this scenario "hash" is always the only sharding key.
    3. Most important by far is to generate good hashes. A lot of field values (most commonly an ObjectId based _id field) are incremental so your hash algorithm must be so that the generated hashes for incremental values result in hash values that hit different shards. Other issues include selecting the appropriate chunk size.

    Some downsides to consider :

    • Default MongoDB chunk balancing becomes less useful since you typically set up your initial cluster with a lot of chunks (this facilitates adding shards to your cluster while maintaining good chunk spread across all shards). This means the balancer will only start splitting if you have enough data in your premade chunks to require splitting.
    • It's likely to become an officially supported MongoDB feature in the near future which may make this entire effort a bit wasteful. Like me you may not have the luxury of waiting though.

    Good luck.

    UPDATE 25/03/2013 : As of version 2.4 MongoDB supports hash indexes natively.