I have a collection A containing one type of documents, and a second collection B containing another kind of documents.
There are multiple documents in collection B that have the same value for the field "b" which references field "a" in the collection A.
If we shard the two collections A and B on "a" and "b" respectively, can we be assured that documents in collection A having "a=foobar" will be co-located with documents in collection B having "b=foobar"?
If we shard the two collections A and B on "a" and "b" respectively, can we be assured that documents in collection A having "a= " will be co-located with documents in collection B having "b=foobar"?
Shard key indexes are defined per collection, and (as at MongoDB 4.0) collections are balanced independently. Even if two collections have identical shard keys, there is definitely no guarantee that the chunk ranges or shard assignments will align.
If you plan to use server-side queries to combine data from these collections using $lookup
or $graphLookup
, note that additional collections you are looking up from cannot currently be sharded. For this use case you would only shard one of the collections. For sharded lookup support there are some relevant improvements to watch/upvote in the MongoDB issue tracker: SERVER-29159 (sharded $lookup
) and SERVER-27533 (sharded $graphLookup
).
There are a few possible approaches to co-locating data, but all have caveats:
A
into B
. This can speed up data retrieval by avoiding the need for joins, but adds some overhead for updates and data storage.For more information on relationship patterns, the Six Rules of Thumb for MongoDB Schema Design blog series is a helpful read. It doesn't cover sharding but the general data model considerations still apply.