After going through the 10Gen manual, I can seem to understand how sharding works in the following scenarios. I will use a document with userid, lastupdatetime, data for the example:
Chunks contain an ordered list of Shard Ids. so if my shard id is userid i expect chunk1 to contain a list of ids: user1...user999(up to the 64mb limit) and chunk2 will hold user1000...user1999. is that correct?
In the previous case, lets say that chunk1 is on shard1 and chunk2 is on shard2. if user1 (which is on shard1) has lots of lots of documents and all other users have 1-2 documents, it will make shard1 disk usage a lot bigger than shard 2 disk usage. If this is correct, what's MongoDB mitigation in that case?
How Compound shard key is ordered inside the chunks? for example, if the compound shard key is userid+lastupdatetime, is it safe to assume the following (assuming user1 has lots of documents): chunk1 to contain a list of values: user1, 10:00:00; user1, 10:01:00...;user1,14:04:11..(up to the 64mb limit) and chunk2 will hold user1,14:05:33; user2,9:00:00...user34, 19:00:00;..
is that correct?
{ MinValue }
to { user1, 12:00:00 }
in chunk one, { user1, 12:00:01 }
to { user2, 04:00:00 }
in chunk two and { user2, 04:00:01 }
to { MaxValue }
on chunk three. MinValue
and MaxValue
are special values that are either smaller than everything else, or larger. The first chunk actually doesn't start with the first value (in your example { user1, 10:00:00 }
but rather with MinValue
.