Search code examples
djangodjango-modelsdjango-treebeard

How to prevent N+1 queries when fetching ancestors with Django Treebeard?


We are using Django Treebeard's materialized path to model an organizational hierarchy as follows:

Organizational Hierarchy Example

Now each node in the organizational tree can have multiple tasks:

class Organization(MP_Node):
    node_order_by = ['name']
    name = models.CharField(max_length=100)

class Task(models.Model):
    organization = models.ForeignKey(Organization, on_delete=models.CASCADE)
    description= models.TextField()

Given a list of tasks, we wish to include the full organizational path of each task in the result. How can we achieve this without the need for N+1 queries?

Expected result for organization Factory 1 could be for example:

Task name Organization Path
Task 1 MyCompany/Factory 1/Maintenance
Task 2 MyCompany/Factory 1/Operations
Task 3 MyCompany/Factory 1
Task 4 MyCompany/Factory 1/Operations

Solution

  • django-treebeard stores the materialized path in the path column as a string like this: 000100020005002I. In this example, the following rows are its ancestors (given the default step length of 4):

    0001
    00010002 
    000100020005  
    000100020005002I
    

    What django-treebeard does is to split a page's path into the aforementioned bits in Python and then perform a database query like so:

    Organization.objects.filter(path__in=['0001', '00010002', '000100020005'])`
    

    To avoid the n+1 query problem, we need to avoid splitting the path in Python and perform the ancestor lookup in the database via a subquery.

    Pattern matching can be used to see if an ancestor's path is contained in the child's path: 00010002 matches 000100020005002I when the candidate's path is used as a pattern for the path of the organization in question:

    000100020005002I LIKE 00010002%  --- equals true
    
    SELECT  
      organization.path, 
      ARRAY(
       SELECT 
         name 
       FROM
         organization o_ 
       WHERE 
         organization.path LIKE o_.path || '%' 
      )
    FROM 
      organization 
    
    organization.path array
    0001 {root}
    00010001 {root, org_a}
    00010002 {root, org_b}
    000100020001 {root, org_b, org_b1}

    Django doesn't provide an out-of-the-box solution for switching the arguments around in a .filter(path__startswith='pattern') lookup (as is required in our case here). That's why I'm using the RawSQL expression.

    >>> from django.db.models.expressions import RawSQL
    
    >>> orgs = Organization.objects.annotate(
        ancestors=RawSQL(
            """
            ARRAY(
              SELECT name FROM organization o_ 
              WHERE organization.path LIKE o_.path || '%%'
            ) 
            FROM organization
            """,
            params=[],
        )
    )
    
    >>> orgs[0].ancestors
    ['Root', "Org 1", "Org 2", "Org 3"]