Search code examples
djangodjango-ormmanytomanyfielddjango-pagination

Optimization of big queryset/context in django


I have to present a very complex page with a lot of data coming from 3 different tables related with ForeignKey and ManyToManyField... I was able to do what I want but performance is terrible and I'm stuck trying to find a better approach... here are detailed code:

Models:

class CATSegmentCollection(models.Model):
    theFile = models.ForeignKey('app_file.File', related_name='original_file')
    segmentsMT = models.ManyToManyField('app_mt.MachineTransTable', related_name='segMT', blank=True,)
    segmentsTM = models.ManyToManyField('app_tm.TMTable',           related_name='segTM', blank=True, through='app_cat.TM_Source_quality',)
    ...

class TM_Source_quality(models.Model):
    catSeg = models.ForeignKey('app_cat.CATSegmentCollection')
    tmSeg = models.ForeignKey('app_tm.TMTable')
    quality = models.IntegerField()

class MachineTransTable(models.Model):
    mt = models.ForeignKey('app_mt.MT_available', blank=True, null=True, )
    ...

class TMTable(models.Model):
    ...

From these models (I just wrote what is relevant to my problem) I present all the CATSegmentCollection entries related to a single file... together with its associated TM and MT segments. In other words each entry in CATSegmentCollection has zero or more TM segment from the TMTable table and zero or more MT segment from the MachineTransTable table.

This is what I do in the ListView (and I use AjaxListView because I'm using a infinite scrolling pagination from django-el-pagination):

class CatListView(LoginRequiredMixin, AjaxListView):
    Model = CATSegmentCollection
    template_name = 'app_cat/cat.html'
    page_template='app_cat/cat_page.html'

    def get_object(self, queryset=None):
        obj = File.objects.get(id=self.kwargs['file_id'])
        return obj

    def get_queryset(self):
        theFile = self.get_object()
        return CATSegmentCollection.objects.filter(theFile=theFile).prefetch_related('segmentsMT').prefetch_related('segmentsTM').order_by('segment_order')

    def get_context_data(self, **kwargs):
        context = super(CatListView, self).get_context_data(**kwargs)
        contextSegment = []
        myCatCollection = self.get_queryset()
        theFile = self.get_object()
        context['file'] = theFile
        for aSeg in myCatCollection:
            contextTarget = []
            if aSeg.segmentsTM.all():
                for aTargetTM in aSeg.tm_source_quality_set.all():
                    percent_quality = ...
                    contextTarget.append( {
                        "source"        : aTargetTM.tmSeg.source,
                        "target"        : aTargetTM.tmSeg.target,
                        "quality"       : str(percent_quality) + '%',
                        "origin"        : "TM",
                        "orig_name"     : aTargetTM.tmSeg.tm_client.name,
                        "table_id"      : aTargetTM.tmSeg.id,
                    })
            if aSeg.segmentsMT.all():
                for aTargetMT in aSeg.segmentsMT.all():
                    contextTarget.append( {
                        "target"    : aTargetMT.target,
                        "quality"   : "",
                        "origin"    : "MT",
                        "orig_name" : aTargetMT.mt.name,
                        "table_id"  : aTargetMT.id
                    })
            contextSegment.append( {
                "id"            : aSeg.id,
                "order"         : aSeg.segment_order,
                "source"        : aSeg.source,
                "target"        : contextTarget,
            })
        context['segments'] = contextSegment
        return context

Everything works but:

  • I hit the DB each time i call aSeg.segmentsTM.all() and aSeg.segmentsMT.all() because I guess the prefetch is not preventing it... this result in hundreds of duplicated queries
  • All these queries are repeated each time I load more entries from the paginations (in other words... each time more entries are presented because of scrolling, the full set of entries are requested... I tried also using lazy_paginate but nothing changes)
  • In principle all the logic I have in get_context_data (there is more but I just presented the essential code) could be reproduced in the template passing just the queryset... or by the client with a lot of jquery/javascript code but I don't think it's a good idea to proceed like this...

So my question is... I can optimize this code reducing the number of DB hits and the time to produce the response? Just to give you an idea a relative small size file (with 300 entries in the CATSegmentCollection) load in 6.5 sec with 330 queries (more than 300 duplicated) taking 0.4 sec. The DJDT time analysis gives

domainLookup    273 (+0)
connect         273 (+0)
request         275 (+-1475922263356)
response        9217 (+-1475922272298)
domLoading      9225 (+-1475922272306)

Any suggestions? Thanks


Solution

  • Following serg suggestions I started to dig into the problem and at the end I was able to prefetch all the needed info. I guess that using a thorough table change the way prefetching works... Here is the correct queryset:

    all_cat_seg = CATSegmentCollection.objects.filter(theFile=theFile).order_by('segment_order')
    all_tm_source_quality_entries = TM_Source_quality.objects.filter(catSeg__in=all_cat_seg).select_related('tmSeg','tmSeg__tm_client')
    prefetch = Prefetch('tm_source_quality_set',queryset=all_tm_source_quality_entries)
    CATSegmentCollection.objects.filter(theFile=theFile).prefetch_related(
        prefetch,
        'segmentsMT',
        'segmentsMT__mt'
    ).order_by('segment_order')
    

    With this queryset, I was able to reduce the number of queries to 10...