Search code examples
pythondjangohierarchical-datadjango-mpttmptt

efficient function to retrieve a queryset of ancestors of an mptt queryset


Does anybody have an efficient algorithm to retrieve all ancestors of an mptt queryset? The best I could think of so far is something like this:

def qs_ancestors(queryset):
    if isinstance(queryset, EmptyQuerySet):
        return queryset
    queryset_aggs = queryset.values_list('tree_id', 'level').annotate(max_lft=Max('lft'), min_rght=Min('rght'))
    new_queryset = queryset.none()
    for tree_id, level, max_lft, min_rght in queryset_aggs:
        ancestors = MyModel.objects.filter(
           tree_id=tree_id,
           level__lt=level, 
           lft__lte=max_lft,
           rght__gte=min_rght,
        )
        new_queryset = ancestors | new_queryset
    return new_queryset

There are two problems with this approach:

  1. It fails if there are branches that aren't next to each other (ie it doesn't really work)
  2. It is highly inefficient because it ends up have number_of_trees*number_of_levels clauses in the final query, which can get very large very fast

I am open to caching the ancestors somewhere else, but I cannot think of a way to do efficiently. I considered adding a field with a comma separated list of ancestor's ids and then doing a GROUP_CONCAT (I am in MySQL) inside an extra, but I think that could get huge/slow.


Solution

  • How about:

    def qs_ancestors(queryset):
        if isinstance(queryset, EmptyQuerySet):
            return queryset
        new_queryset = queryset.none()
        for obj in queryset:
            new_queryset = new_queryset | obj.get_ancestors()
    return new_queryset
    

    It's still len(queryset) clauses. You could potentially reduce the number of clauses a bit by doing a preprocess pass that removes objects that are ancestors of other objects in the queryset, something like:

    min_obj_set = []
    for obj in queryset.order_by('tree_id', '-level'):
        for obj2 in min_obj_set:
            if obj.is_ancestor_of(obj2):
                break
        else:
            min_obj_set.append(obj)
    

    Although the above snippet is only an example, you'll probably want to use a BST if your querset contains a significant amount of objects.

    You'll have to test if this yeilds an increase in speed vs. the larger DB query, though.