I need to make a data model that has a recursive relationship in a way that each user has two children users and each children user has another two children users. That behavior is repeated many times growing the tree. I want to use MongoDB as my database. I had read that is not recomended put a large set of nodes inside a document, so I was thinking make model as a relationnal model or use other database to achieve that goal. What do you recomend me?
I would say that it really depends on your data access patterns. You can consider (at least) two scenarios
This gives you the best performance when it comes to reads - you just query by id or by any other parent-level field. It would be recommended for simple data access like get whole tree, update whole tree as single business operation. The downside is that querying by child documents and updating them becomes more complicated (but still possible - array filters). The other thing you need to keep in mind is that MongoDB has a 16 MB limitation for single document size. That's a lot but you need to know how far your tree can grow.
You can store multiple documents in the same collection and each one can point to it's parent:
{
_id: 1,
name: "root",
parent: null
},
{
_id: 2,
name: "child",
parent: 1
}
This will make working with child documents easier (when your business operation doesn't include the parent). The drawback is that querying to get the tree will be slower as the data needs to be "joined" however MongoDB gives you a possibility to retrieve parent with its children by running $graphLookup:
{
$graphLookup: {
from: "sameCollection",
startWith: "$parent",
connectFromField: "parent",
connectToField: "child",
as: "tree"
}
}
$graphLookup
works recursively, you can specify maxDepth
.
You can also consider hybrid approach - store the tree separately for fast reads and keep the data relational for child-level operation. The challenge in such case is how to handle data updates since the same value is duplicated in more than one document. You can consider eventual consistency models like rebuilding your tree every hour based on current data or just trigger tree reprocessing when child is rebuilt - this really depends on your business requirements.