Search code examples
sqldjangodjango-modelsdjango-ormdjango-aggregation

Wrong GROUP BY field in Django annotate query


The original problem caused by quite awkward cycling models reference:

# A -> B -> A

class A:
    b = models.ForeignKey('B', null=True, blank=True)

class B:
    a = models.ForeignKey('A')

Now, when I'm trying to annotate query, it always uses GROUP BY a's id from LEFT OUTER JOIN ( T3.id in the example below) instead of a.id.

Example:

A.objects.select_related('b', 'b__a').annotate(reviews=Count('reviews'))

Generated SQL:

SELECT 
    `a`.`id`,
    `b`.`id`,
    T3.`id`,
FROM
    `a`
        LEFT OUTER JOIN
    `b` ON (`a`.`b_id` = `b`.`id`)
        LEFT OUTER JOIN
    `a` T3 ON (`b`.`a_id` = T3.`id`)
WHERE
    `a`.`id` IN (1, 2, 3, 4, 5)
GROUP BY T3.`id`
ORDER BY NULL;

I know I can do next things:

  1. Change model not to do cycling reference (unfortunately can't do that right now)
  2. Can use .extra() instead of annotations (I'd try to avoid it)
  3. Remove .select_related() call (can't do due to performance issues)

UPD: Using GROUP BY T3.id will exclude results, where a.b == None

The best solution for me would be just specifying correct field in GROUP BY clause, but I don't know how. Is it possible? Is there any other way to fix the problem? Thanks.


Solution

  • Opened Django compiler:

    def collapse_group_by(self, expressions, having):
        # If the DB can group by primary key, then group by the primary key of
        # query's main model. Note that for PostgreSQL the GROUP BY clause must
        # include the primary key of every table, but for MySQL it is enough to
        # have the main table's primary key. Currently only the MySQL form is
        # implemented.
        # MySQLism: however, columns in HAVING clause must be added to the
        # GROUP BY.
        if self.connection.features.allows_group_by_pk:
            # The logic here is: if the main model's primary key is in the
            # query, then set new_expressions to that field. If that happens,
            # then also add having expressions to group by.
            pk = None
            for expr in expressions:
                if (expr.output_field.primary_key and
                        getattr(expr.output_field, 'model') == self.query.model):
                    pk = expr
                    # HERE BREAKPOINT REQUIRED
            if pk:
                expressions = [pk] + [expr for expr in expressions if expr in having]
        return expressions
    

    So, collapse_group_by function doesn't stop looking for pk even it's found already, that's why grouping by is done by T3.id instead of a.id (and thus I have missing results). To fix the problem, breakpoint is required inside for loop (marked in comments).

    UPD: the issue already fixed in Django 1.8.2 version https://code.djangoproject.com/ticket/24748