Search code examples
pythondjangodjango-modelsaggregationdjango-annotate

Complex aggregation in Django


Using Django rest framework 3.x and Django 1.1.10. I have a model that represents users. When I list all the users by accessing /users/ endpoint in DRF the list has to include some more data that is related to users via another model, called Owner. Each Item has an Owner and Owners have Users.

I made an extra property on the User model and it just returns a JSON array of the data. This is something that I can't change, because it is a requirement on the front-end. I have to return a total number of items that are related to each user and there are three different counts to perform to get the data.

I need to get multiple count()'s of items on the same model but with different conditions.

Doing these separately is easy, two are trivial and the last one is more complicated:

Item.objects.filter(owner__user=self).count()
Item.objects.filter(owner__user=self, published=True).count()
Item.objects.filter(Q(history__action__name='argle') | Q(history__action__name='bargle'),
                    history__since__lte=now,
                    history__until__gte=now,
                    owner__user=self).count()

The problem is because this is run for every user and there are many of them. In the end this generates more than 300 DB queries and I would like to bring these to a minimum.

So far I've came up with this:

Item.objects.filter(owner__user=self)\
            .aggregate(published=Count('published'),
                       total=Count('id'))

This will aggregate first two counts, return them and only one SELECT will be performed on the database. Is there a way to incorporate the last count() call into that same aggregate()?

I tried many things, but it seems impossible. Should I just write a custom SELECT and use Item.objects.raw()?

I also noticed that performing the aggregate() and the last count() is faster on my development machine and SQLite than on the staging server with Postgresql, which is a tad strange, but it's not my main concern right now.


Solution

  • Since you require the counts for each item in your QuerySet you should use annotate instead of aggregate, this will then only perform 1 query.

    The best way for counting related objects based on a condition is to use conditional aggregation

    User.objects.annotate(
        total_items=Count('items'),
        published=Sum(Case(When(items__published=True, then=1), output_field=IntegerField())),
        foo=Sum(Case(When(
            Q(history__action__name='argle') | Q(history__action__name='bargle'),
            history__since__lte=now,
            history__until__gte=now,
            then=1
        ), output_field=IntegerField()))
    )