python django django-models aggregation django-annotate

Complex aggregation in Django

Using Django rest framework 3.x and Django 1.1.10. I have a model that represents users. When I list all the users by accessing /users/ endpoint in DRF the list has to include some more data that is related to users via another model, called Owner. Each Item has an Owner and Owners have Users.

I made an extra property on the User model and it just returns a JSON array of the data. This is something that I can't change, because it is a requirement on the front-end. I have to return a total number of items that are related to each user and there are three different counts to perform to get the data.

I need to get multiple count()'s of items on the same model but with different conditions.

Doing these separately is easy, two are trivial and the last one is more complicated:

Item.objects.filter(owner__user=self).count()
Item.objects.filter(owner__user=self, published=True).count()
Item.objects.filter(Q(history__action__name='argle') | Q(history__action__name='bargle'),
                    history__since__lte=now,
                    history__until__gte=now,
                    owner__user=self).count()

The problem is because this is run for every user and there are many of them. In the end this generates more than 300 DB queries and I would like to bring these to a minimum.

So far I've came up with this:

Item.objects.filter(owner__user=self)\
            .aggregate(published=Count('published'),
                       total=Count('id'))

This will aggregate first two counts, return them and only one SELECT will be performed on the database. Is there a way to incorporate the last count() call into that same aggregate()?

I tried many things, but it seems impossible. Should I just write a custom SELECT and use Item.objects.raw()?

I also noticed that performing the aggregate() and the last count() is faster on my development machine and SQLite than on the staging server with Postgresql, which is a tad strange, but it's not my main concern right now.

Solution

Since you require the counts for each item in your QuerySet you should use annotate instead of aggregate, this will then only perform 1 query.

The best way for counting related objects based on a condition is to use conditional aggregation

User.objects.annotate(
    total_items=Count('items'),
    published=Sum(Case(When(items__published=True, then=1), output_field=IntegerField())),
    foo=Sum(Case(When(
        Q(history__action__name='argle') | Q(history__action__name='bargle'),
        history__since__lte=now,
        history__until__gte=now,
        then=1
    ), output_field=IntegerField()))
)