Search code examples
mongodbsharding

MongoDB Compound Key Sharding And chunks vs disk size


After going through the 10Gen manual, I can seem to understand how sharding works in the following scenarios. I will use a document with userid, lastupdatetime, data for the example:

  1. Chunks contain an ordered list of Shard Ids. so if my shard id is userid i expect chunk1 to contain a list of ids: user1...user999(up to the 64mb limit) and chunk2 will hold user1000...user1999. is that correct?

  2. In the previous case, lets say that chunk1 is on shard1 and chunk2 is on shard2. if user1 (which is on shard1) has lots of lots of documents and all other users have 1-2 documents, it will make shard1 disk usage a lot bigger than shard 2 disk usage. If this is correct, what's MongoDB mitigation in that case?

  3. How Compound shard key is ordered inside the chunks? for example, if the compound shard key is userid+lastupdatetime, is it safe to assume the following (assuming user1 has lots of documents): chunk1 to contain a list of values: user1, 10:00:00; user1, 10:01:00...;user1,14:04:11..(up to the 64mb limit) and chunk2 will hold user1,14:05:33; user2,9:00:00...user34, 19:00:00;..

    is that correct?


Solution

    1. Yes, you are correct.
    2. Your shard key determines where chunks can be split. If your shard key is "userid" then the smallest it can split up is on the userID. MongoDB automatically sizes chunks based on the document sizes. So it's going to be very likely that chunk1 (on shard1) only has f.e. documents with UserIDs in the range 1..10, and chunk2 (on shard2) the documents where the userIDs are 11..1000. MongoDB automatically will pick the best fitting range that maps to each chunk.
    3. That is correct as well. With a compound shard key, the "unit" in which documents can be divided is the combination of both fields. So you can have { MinValue } to { user1, 12:00:00 } in chunk one, { user1, 12:00:01 } to { user2, 04:00:00 } in chunk two and { user2, 04:00:01 } to { MaxValue } on chunk three. MinValue and MaxValue are special values that are either smaller than everything else, or larger. The first chunk actually doesn't start with the first value (in your example { user1, 10:00:00 } but rather with MinValue.