Search code examples
mongodbsharding

Good Shard Keys in MongoDB


From the book Scaling MongoDB:

The general case

We can generalize this to a formula for shard keys: {coarseLocality : 1, search : 1}

So my question is, is that correct? shouldn't be the oposite for better writing?

Also from the book:

This pattern continues: everything will always be added to the “last” chunk, meaning everything will be added to one shard. This shard key gives you a single, undistributable hot spot.

So saying that my app always search by user_id, and last entries in the collection.

What is the best shard key i should have, this:

{_id:1, user_id:1}

or:

{user_id:1,_id:1}

Solution

  • Kristina (author of Scaling MongoDB) wrote a blog post which has some example strategies explained in the guise of a game: How to Choose a Shard Key: The Card Game.

    There are many considerations to choosing a good shard key based on your application requirements and use cases.

    The general advice of {coarseLocality : 1, search : 1} order is to ensure there is some locality of your data for reading.

    So in your case, you would most likely want: {user_id:1,_id:1}.

    That will provide some locality of data for the same user_id when querying, and ideally your common queries will be able to get their data from a single shard.

    The opposite order may provide for better write distribution (assuming _id is not a monotonically increasing key like a default ObjectId) but a potential downside is reliability: if your data for a read query is scattered across all shards, you will have retrieval problems if any one shard is down.

    So saying that my app always search by user_id, and last entries in the collection.

    If you commonly search by user_id (and without _id) this will also affect your choice of shard key and index optimization. To find the last entries MongoDB will have to do a sort; you will want to be doing that sort on a single shard rather than having to gather the data from all shards and sorting. If your _id happens to be date-based that would be beneficial as part of the shard key in order to find the last entries.