Search code examples
sqldjangocommon-table-expressiondjango-orm

Django Query API select_related in multi-table count query


Imagine I'm trying to count the number of tree cultivars in a certain park, and the trees are grouped in specific fields within the park. One catch is that trees can be replanted, so I want to count only existing trees and empty locations (removed IS NULL for trees currently planted, and the LEFT and RIGHT JOIN for locations not yet planted). After wandering through lots of Django select_related posts and documentation, I cannot see how to do this through Django's Query API. I have the park ID, so I'd be doing this starting with the Parks model in views.py.

Am I missing something obvious (or not obvious) in Django's Query API?

The following SQL does what I'd like:

WITH p AS (
    SELECT t.cultivar, l.id AS loc
    FROM trees t
    JOIN locations l ON t.location = l.id
    JOIN fields d ON l.field = d.id 
    WHERE d.park = 'SOME_PARK_ID'
        AND t.removed IS NULL
    )
SELECT c.name, count(*)
FROM p
LEFT JOIN cultivars c ON p.cultivar = c.id
RIGHT JOIN locations l ON p.loc = l.id
GROUP BY name;

For example:

+--------+-------+
| name   | count |
|--------+-------|
| <null> | 2     |
| HGG    | 2     |
| BOX    | 3     |
| GRV    | 1     |
+--------+-------+

Possible views.py:

class ParkDetail( DetailView ):
    model = Parks
    # Would like to get a count of cultivars here
    def get_context_data( self, **kwargs ):  # ... Lost here
        qry = Parks.objects.select_related( 'tree__location__field' ).filter( tree.removed is None )
        tree_count = qry.annotate( Count( ... ) )

Germane parts of models.py:

class Parks( models.Model ):
    name = models.CharField( max_length = 64, blank = False )

class Fields( models.Model ):
    park = models.ForeignKey( Parks, on_delete = models.CASCADE )

class Locations( models.Model ):
    field = models.ForeignKey( Fields, on_delete = models.CASCADE )

class Cultivars( models.Model ):
    name = models.CharField( max_length = 64, blank = False )

class Trees( models.Model ):
    location = models.ForeignKey( Locations, on_delete = models.CASCADE )
    cultivar = models.ForeignKey( Cultivars, on_delete = models.PROTECT )
    planted = models.DateField( blank = False )
    removed = models.DateField( blank = True, null = True )


Solution

  • I don't see how you need .select_related(…) here. You could count the Trees not removed for a given park_id with:

    from django.db.models import Count
    
    Cultivars.objects.filter(
        tree__removed=None, tree_location__field__park_id=my_park_id
    ).annotate(tree_count=Count('tree'))

    The Cultivars arising from this QuerySet will have an extra attribute named .tree_count with the number of not-removed Trees with my_park_id as park_id.

    If you also want to retrieve Cultivarss for which no tree was planted, we can work with:

    from django.db.models import Case, Count, F, Value, When
    
    Cultivars.objects.annotate(
        tree_count=Count(
            Case(
                When(
                    tree__removed=None,
                    tree_location__field__park_id=my_park_id,
                    then=F('tree'),
                ),
                default=Value(None),
            )
        )
    )