Search code examples
djangompttdjango-treebeard

Django treebeard what are differences between AL, NS, MP


I'm trying to make a model for categorizing some objects.

I already tried using django-mptt to easily retrieve related categories, and now I'm searching different solutions to find the best one.

I can't find out though what are main differences between Materialized Path, Adjacency List and Nested Set. Wikipedia didn't give me a short answer, all I know is mptt is probably Nested Set...

Can anyone explain it to me in few words ?


Solution

  • It's easier to explain with examples than with a few words. Consider the sample tree, storing names:

    William
        Jones
        Blake    
            Adams        
        Tyler    
    Joseph
        Miller
        Flint
    

    Materialized Path means each node stores its full path encoded. For instance, you can store its index using a dot as a delimiter

    Name        Path
    William     1
    Jones       1.1
    Blake       1.2
    Adams       1.2.1
    Tyler       1.3
    Joseph      2
    Miller      2.1
    Flint       2.2
    

    So, Adams knows its full name is Wiliam Blake Adams, because it has the 1.2.1 path, pointing to the 1 node at first level — William — to the 1.2 node at level 2 — Blake — and 1.2.1 node at level 3 — Adams.

    Adjacency List means the tree is stored by keeping a link to some adjacent nodes. For instance, you can store who is the parent and who is the next sibling.

    Name        Parent     Next
    William     null       Joseph
    Jones       William    Blake
    Blake       William    Tyler
    Adams       Blake      null
    Tyler       William    null
    Joseph      null       null    
    Miller      Joseph     Flint
    Flint       Joseph     null
    

    Notice that it could be as simple as just storing the parent, if we don't need to keep the children of a node ordered. Now Adams can get all his ancestors recursively until null to find his full name.

    Nested sets means you store each node with some index, usually a left and right value, assigned to each one as you traverse the tree in DFS order, so you know its descendants are within those values. Here's how the numbers would be assigned to the example tree:

      1 William 10
        2 Jones 3
        4 Blake 7   
            5 Adams 6
        8 Tyler 9
    11 Joseph 16
        12 Miller 13 
        14 Flint 15
    

    And it would be stored as:

    Name        left   right
    William     1      10
    Jones       2      3
    Blake       4      7
    Adams       5      6
    Tyler       8      9  
    Joseph      11     16
    Miller      12     13
    Flint       14     15
    

    So, now Adams can find its ancestors by querying who has a lower left AND a higher right value, and sort them by the left value.

    Each model has its strengths and weaknesses. Choosing which one to use really depends on your application, the database you're using and what kind of operations you'll be doing most often. You can find a good comparison here.

    The comparison mentions a fourth model that isn't very common (I don't know of any other implementation but my own) and very complicated to explain in a few words: Nested Interval via Matrix Encoding.

    When you insert a new node in a nested set you have to re-enumerate everyone who is ahead of it in the traversal. The idea behind nested intervals is that there's an infinite number of rational numbers between any two integers, so you could store the new node as a fraction of its previous and next nodes. Storing and querying fractions can be troublesome, and this leads to the matrix encoding technique, which transforms those fractions in a 2x2 matrix and most operations can be done by some matrix algebra that isn't obvious at first sight but can be very efficient.

    Treebeard has very straightforward implementations of each one of Materialized Path, Nested Sets and Adjacency Lists, with no redundancy. django-mptt actually uses a mix of Nested Sets and Adjacency Lists, since it also keeps a link to the parent and can use it to both query children more efficiently, and to rebuild the tree in case it gets messed up by someone.