Search code examples
mongodbtreeaggregation-frameworkancestordescendant

MongoDB Tree Model: Get all ancestors, Get all descendants


I have an arbitrary tree structure.

Example data structure:

root
  |--node1
  |     |--node2
  |     |     |--leaf1
  |     |
  |     |--leaf2
  |
  |--node3
        |--leaf3

Each node and leaf has 2 properties: id and name.


The important queries:

1.: A leaf id is given. The query should return the whole path from root to that leaf, with all node's id and name properties.

It's not important if the return value is an sorted array of nodes or if it's an object where the nodes are nested.

Example: If the id of leaf2 is given, the query should return: root(id, name), node1(id, name), leaf2(id, name).


2.: Given any node id: Get the whole (sub)tree. Here it would be nice to retrieve a single object where each node has a children array.


Thoughts, trials and errors:

1.: First I tried to simply model the tree as a single JSON document, but then the query would become impossible: There's no way to find out at which nesting level the leaf is. And if I knew the whole path of ids from root to the leaf, I'd had to use a projection with multiple positional operators and that's not supported by MongoDB at the moment. Additionally it's not possible to index the leaf ids because the nesting can be infinite.

2.: Next idea was to use a flat data design, where each node has an array which contains the node's ancestor ids:

{
  id: ...,
  name: ...,
  ancestors: [ rootId, node1Id, ... ]
}

This way I'd have to do 2 queries, to get the whole path from root to some node or leaf, which is quite nice.

Questions:

If I choose data model 2.: How can I get the whole tree, or a subtree?

Getting all descendants is easy: find({ancestors:"myStartingNodeId"}). But those will of course be not sorted or nested.

Is there a way using the aggregation framework or a completely different data model to solve this problem?

Thank you!


Solution

  • Here's what data structure I finally came up with. It's optimized for read queries. Some write queries (like moving subtrees) can be painful.

    {
      id: "...",
      ancestors: ["parent_node_id", ..., "root_node_id"], // order is important!
      children: ["child1_id", "child2_id", ...]
    }
    

    Benefits:

    • Easy to get all documents for a sub-tree

    • Easy to get all documents from some node to the root

    • Easy to check if some document is parent/child/ancestor/descendant of some node

    • Children are sorted. Can be easily moved by changing the children array order

    How to use it:

    • Get by ID: findOne({ id: "..." })

    • Get Parent: findOne({ children: "..." })

    • Get all Ancestors: first do Get by ID, then grab the ancestors array and look for all documents that match the given list of IDs

    • Get all Children: find({ 'ancestors.0': "..." })

    • Get all Descendants: find({ ancestors: "..." })

    • Get all Descendants up to x generations: find({ $and: [ {ancestors: "..."}, {ancestors: {$size: x}} ] })

    Drawbacks:

    • The application code has to take care of correct order.

    • The application code has to build nested objects (maybe this is possible using MongoDB Aggregation framework).

    • Every insert must be done using 2 queries.

    • Moving whole sub-trees between nodes must update a lot of documents.