I have two schemas for products and folders
const ProductSchema = new Schema<TProductSchema>(
{
productName: String,
vendor: { type: Schema.Types.ObjectId, ref: 'Vendor' },
folder: { type: Schema.Types.ObjectId, ref: 'ProductFolder' },
}
);
const ProductFolderSchema = new Schema<TProductFolderSchema>(
{
folderName: { type: String, required: true },
parent: { type: Schema.Types.ObjectId, ref: 'ProductFolder' },
children: [{ type: Schema.Types.ObjectId, ref: 'ProductFolder' }],
}
);
I want the following functionality in my application:
When user selects a folder, all products whose folder
is equal to selected folder _id
and whose folder
is direct or indirect children of selected folder should be returned.
Folders and Products are separate entities: user can create and delete folders independently of products and vice versa.
The problem with current schema is that it's hard to implement finding products whose folder
is a child of selected folder.
I expect each folder to contain up to 100.000 products.
I was thinking if adding products
field that would contain product's _id
s to ProductFolder
is a good idea.
What schema design would you recommend? Should I add products
field or leave it as is and come up with a solution to find nested folders? Or maybe there's some other approach?
When user selects a folder, all products whose folder is equal to selected folder _id and whose folder is direct or indirect children of selected folder should be returned.
Here is a way to implement this, for example the models are:
product: {
_id: <ObjectId>,
name: <string>,
vendor: <number>,
}
folder: {
_id: <ObjectId>,
parent: <ObjectId>,
child: [ <ObjectId>, <ObjectId>, ... ]
products: [ <ObjectId>, <ObjectId>, ... ],
}
Example data (shows some fields with data for a sample querying):
products:
[
{ _id: "p11" }, { _id: "p12" }, { _id: "p13" },
{ _id: "p14" }, { _id: "p15" }, { _id: "p16" },
{ _id: "p17" }, { _id: "p18" }, { _id: "p19" },
{ _id: "p20" }, { _id: "p21" }, { _id: "p90" },
{ _id: "p91" }, { _id: "p92" },
]
folders:
[
{ _id: 1, parent: "...", children: [ 2, 3 ], products: [ "p11", "p12" ] },
{ _id: 2, parent: "...", children: [], products: [ "p13" ] },
{ _id: 3, parent: "...", children: [ 4 ], products: [ "p20", "p21" ] },
{ _id: 4, parent: "...", children: [ 40 ], products: [ "p16", "p17" ] },
{ _id: 9, parent: "...", children: [ ], products: [ "p92" ] },
{ _id: 40, parent: "...", children: [ ], products: [ "p90", "p91" ] },
]
The query is to get all the products for the current folder and its sub folders recursively. Assuming the selected folder is { _id: 1 }
, the aggregation query uses $graphLookup
and all the resulting "products" are listed in the all_products
array.
db.folders.aggregate( [
{
$match: { _id: 1 }
},
{
$graphLookup: {
from: "folders",
startWith: "$children",
connectFromField: "children",
connectToField: "_id",
maxDepth: 99999,
as: "hierarchy"
}
},
{
$project: {
all_products: {
$reduce: {
input: "$hierarchy",
initialValue: "$products",
in: { $concatArrays: [ "$$value", "$$this.products" ] }
}
}
}
},
] )
Example's output:
{
"_id" : 1,
"all_products" : [
"p11",
"p12",
"p16",
"p17",
"p13",
"p20",
"p21",
"p90",
"p91"
]
}
Note the $graphLookup
outputs data in no particular order. Further, you can use $lookup
to fetch the product details from the products
collection (this stage is not shown in the query). The query runs in mongosh
.
With this design, modifying data may require changing data in both collections. For example, if you delete a product, the data needs to be deleted from both collections.