Same shards across different MongoDB collections

I have a collection A containing one type of documents, and a second collection B containing another kind of documents.

There are multiple documents in collection B that have the same value for the field "b" which references field "a" in the collection A.

If we shard the two collections A and B on "a" and "b" respectively, can we be assured that documents in collection A having "a=foobar" will be co-located with documents in collection B having "b=foobar"?

Solution

If we shard the two collections A and B on "a" and "b" respectively, can we be assured that documents in collection A having "a= " will be co-located with documents in collection B having "b=foobar"?

Shard key indexes are defined per collection, and (as at MongoDB 4.0) collections are balanced independently. Even if two collections have identical shard keys, there is definitely no guarantee that the chunk ranges or shard assignments will align.

If you plan to use server-side queries to combine data from these collections using $lookup or $graphLookup, note that additional collections you are looking up from cannot currently be sharded. For this use case you would only shard one of the collections. For sharded lookup support there are some relevant improvements to watch/upvote in the MongoDB issue tracker: SERVER-29159 (sharded $lookup) and SERVER-27533 (sharded $graphLookup).

There are a few possible approaches to co-locating data, but all have caveats:

Denormalize: duplicate the most commonly used fields from A into B. This can speed up data retrieval by avoiding the need for joins, but adds some overhead for updates and data storage.
Embed the related data so you have a single sharded collection. This will not be ideal if your collections have very different growth or access patterns, or a large one-to-many relationship.
Manage the data distribution manually: disable balancing for these collections, manually split (or pre-split) chunks so the chunk ranges are identical, and use zone sharding for shard affinity.

For more information on relationship patterns, the Six Rules of Thumb for MongoDB Schema Design blog series is a helpful read. It doesn't cover sharding but the general data model considerations still apply.