Search code examples
djangodjango-queryset

django - improve performance of __in queryset in M2M filtering


I have a models that has a M2M relationship to another model.

These are my models:

class Catalogue(models.Model):
    city = models.CharField(db_index=True,max_length=100, null=True)
    district = models.CharField(db_index=True,max_length=100, null=True)
    type = models.ManyToManyField(Type, db_index=True)
    datetime = models.CharField(db_index=True, max_length=100, null=True)


class Type(models.Model):
    name = models.CharField(max_length=100)

    def __str__(self):
        return self.name

And this is views.py:

class all_ads(generic.ListView):
    paginate_by = 12
    template_name = 'new_list_view_grid-card.html'
    def get_queryset(self):
        city_district = self.request.GET.getlist('city_district')
        usage = self.request.GET.get('usage')
        status = self.request.GET.get('status')

        last2week = datetime.datetime.now() - datetime.timedelta(days=14)

        status = status.split(',')

        if usage:
            usage = usage.split(',')
        else:
            usage = ['1','2','3','4','5','6','7','8','9','10','11','12','13','14','15','16','17','18','19','20','21','22','23','24','25','26','27','28','29','30','31']

        intersections = list(set(status).intersection(usage))
        type_q = (Q(type__in=intersections) & Q(type__isnull=False))

        result = models.Catalogue.objects.filter(
            Q(datetime__gte=last2week) &
            type_q &
            ((reduce(operator.or_, (Q(city__contains=x) for x in city_district)) & Q(city__isnull=False)) |
             (reduce(operator.or_, (Q(district__contains=x) for x in city_district)) & Q(district__isnull=False)))
        ).distinct().order_by('-datetime').prefetch_related('type')

        return result

I want to filter MySQL db with some queries and return result in a listview.

It works good on a small database, but with large database it takes over 10 seconds to return results. If I delete type_q query, It takes 2 seconds (reduce 10 second!).

How can I improve performance of __in queryset?


Solution

  • It looks like type_q itself is not really the culprit, but acts as a multiplier, since now we make a LEFT OUTER JOIN, and thus the __contains runs over all combinations. This is thus more a peculiarity of two filters that work together

    We can omit this with:

    cat_ids = list(Catalogue.objects.filter(
        Q(*[Q(city__contains=x) for x in city_district], _connector=Q.OR) |
        Q(*[Q(district__contains=x) for x in city_district], _connector=Q.OR)
    ).values_list('pk', flat=True))
    
    result = models.Catalogue.objects.filter(
        Q(datetime__gte=last2week),
        type_q,
        pk__in=cat_ids
    ).distinct().order_by('-datetime').prefetch_related('type')

    Some database (MySQL is known to not optimize a subquery very well), can even do that with a subquery with. So here we do not materialize the list, but let Django work with a subquery:

    cat_ids = Catalogue.objects.filter(
        Q(*[Q(city__contains=x) for x in city_district], _connector=Q.OR) |
        Q(*[Q(district__contains=x) for x in city_district], _connector=Q.OR)
    ).values_list('pk', flat=True)
    
    result = models.Catalogue.objects.filter(
        Q(datetime__gte=last2week),
        type_q,
        pk__in=cat_ids
    ).distinct().order_by('-datetime').prefetch_related('type')