Search code examples
djangodjango-modelsdjango-debug-toolbar

How can I prefetch related objects in Django?


Assuming I have the following model with related methods:

class Turbine(models.Model):
    ...
    pass

def relContracts(self):
    contracts = self.contracted_turbines.all()
    return contracts

class Contract(models.Model):
    turbines = models.ManyToManyField(Turbine,related_name='contracted_turbines')

def _contracted_windfarm_name(self):
    windfarms = self.turbines.order_by().values_list("wind_farm__name", flat=True).distinct().select_related
    if len(windfarms) == 1:
        return windfarms[0]
    else:
        return ", ".join([str(x) for x in windfarms])
contracted_windfarm_name = property(_contracted_windfarm_name)

def _turbine_age(self):
    first_commisioning = self.turbines.all().aggregate(first=Min('commisioning'))['first']
    start = self.start_operation.year
    age = start - first_commisioning.year
    return age
turbine_age = property(_turbine_age)

Django-debug-toolbar tells me, that the functions "_contracted_windfarm_name" and "_turbine_age" result in database duplicates for each contract.

My contracts queryset is received by the following get_queryset method where I already prefetched 'turbines' for other methods succesfully:

def get_queryset(self, **kwargs):
    qs = super(ContractTableView, self).get_queryset().filter(active=True).prefetch_related('turbines', 'turbines__wind_farm')
    self.filter = self.filter_class(self.request.GET, queryset=qs)
    return self.filter.qs

I tried prefetching 'turbines__contracted_turbines' without being able to reduce the number of duplicates.

The _contracted_windfarm_name method is used to populate a column of a django-tables2 method as follows:

contracted_windfarm = dt2.Column(accessor='contracted_windfarm_name', verbose_name='Wind Farm', orderable=False)

Where am I mistaking? How can I prefetch the associated contracts of a turbine?

SOLUTION: First problem

I added a simple annotation to the queryset within the get_queryset() method:

def get_queryset(self, **kwargs):
    qs = super(ContractTableView, self).get_queryset()\
      .filter(active=True).prefetch_related('turbines', 'turbines__wind_farm')\
      .annotate(first_com_date=Case(When(turbines__commisioning__isnull=False, then=Min('turbines__commisioning'))))
    self.filter = self.filter_class(self.request.GET, queryset=qs)
    return self.filter.qs

This leads to a slight change in the _turbine_age() method:

def _turbine_age(self):
    first_commisioning = self.first_commisioning
    start = self.start_operation.year
    age = start - first_commisioning.year
    return age
turbine_age = property(_turbine_age)

SOLUTION: second problem

With the turbines__wind_farm being prefetched in the get_queryset() method, there is no need to call the distinct() method:

def _contracted_windfarm_name(self):
    windfarms = list(set([str(x.wind_farm.name) for x in self.turbines.all()]))
    if len(windfarms) == 1:
        return windfarms[0]
    else:
        return ", ".join([str(x) for x in windfarms])
contracted_windfarm_name = property(_contracted_windfarm_name)

All duplicated queries could be removed!

Thanks to @dirkgroten for his valuable contributions!


Solution

  • from django.db.models import Min
    
    class ContractManager(models.Manager):
        def with_first_commissioning(self):
            return self.annotate(first_commissioning=Min('turbines__commissioning'))
    
    class Contract(models.Model):
        objects = ContractManager()
        ...
    

    then Contract.objects.with_first_commissioning() returns you a Queryset with the additional first_commissioning value for each Contract. So in Contract._turbine_age() you can just remove the first line.

    Now the windfarm names case is a bit more complex. If you're using Postgresql (which supports StringAgg) you could similarly add in your ContractManager this queryset:

    from django.db.models import Subquery, OuterRef
    from django.contrib.postgres.aggregates import StringAgg
    
    def with_windfarms(self):
        wind_farms = WindFarm.objects.filter('turbines__contract'=OuterRef('pk')).order_by().distinct().values('turbines__contract')
        wind_farm_names = wind_farms.annotate(names=StringAgg('name', delimiter=', ')).values('names')
        return self.annotate(wind_farm_names=Subquery(wind_farm_names))
    

    then in your _contracted_windfarm_name() method, you can access self.wind_farm_names assuming you're looping through the results of the queryset (you should probably check with hasattr in case your method gets used in a different way).

    If you're not on Postgresql, then just change the queryset to perform a prefetch_related and then make sure you don't add any query-related logic after that:

    from django.db.models import Prefetch
    
    def with_windfarms(self):
         return self.prefetch_related(Prefetch('turbines', queryset=Turbine.objects.order_by().select_related('wind_farm').distinct('wind_farm__name')))
    

    so that in your _contracted_wind_farms method, you can do [str(x.wind_farm.name) for x in self.turbines]

    In both cases, I'm assuming somewhere in your views you loop through the contracts in the queryset:

    for contract in Contract.objects.with_first_commissioning():
        contract._turbine_age()...
    
    for contract in Contract.objects.with_windfarms():
        contract._contracted_windfarm_name()...