Search code examples
djangodjango-orm

Django ORM - 'annotate' and 'order_by' doesn't seem to work equivalently to 'GROUP BY' and 'ORDER BY'?


Summary:

ordered_tags = tagged_items.annotate(freq=Count('tag_id')).order_by('-freq')
    for i in ordered_tags:
        print(i.tag_id, i.tag, i.freq)

This does not work equivalently to GROUP BY tag_id ORDER BY COUNT(tag_id) DESC. Why?


Using Django ORM, I am trying to do something like:

SELECT tag_id, COUNT(tag_id) AS freq
FROM taggit_taggeditem
WHERE content_type_id IN (
    SELECT id FROM django_content_type
    WHERE app_label = 'reviews'
    AND model IN ('problem', 'review', 'comment')
) AND (
    object_id = 1
    OR object_id IN (
        SELECT id FROM review
        WHERE problem_id = 1
    ) OR object_id IN (
        SELECT c.id FROM comment AS c
        INNER JOIN review AS r
        ON r.id = c.review_id
    )
) 
GROUP BY tag_id
ORDER BY freq DESC;

So here's what I have contrived:

    querydict_for_content_type_id = {
        'current_app_label_query' : Q(app_label=ReviewsConfig.name),
        'model_name_query' : Q(model__in=['problem', 'review', 'comment'])
    }

    # used in content_type_query
    query_for_content_type_id = reduce(operator.__and__, querydict_for_content_type_id.values())

    # Query relevant content_type from TaggedItem model.    
    content_type_query = Q(content_type_id__in=ContentType.objects.filter(query_for_content_type_id))

    # Query relevant object_id from TaggedItem model.
    object_query = Q(object_id=pk) | Q(object_id__in=problem.review_set.all())
    for review in problem.review_set.all():
        object_query |= Q(object_id__in=review.comment_set.all())

    tagged_items = TaggedItem.objects.filter(content_type_query&object_query)

    # JOIN Tag
    # tags = tagged_items.select_related('tag')
    
    # GROUP BY freq ORDER_BY freq DESC;
    ordered_tags = tagged_items.annotate(freq=Count('tag_id'))#.order_by('-freq')
    for i in ordered_tags:
        print(i.tag_id, i.tag, i.freq)

Django ORM doesn't work as intended. Calling distinct method on ordered_tags doen't work either.

Output:

18 tag1 1
19 tag2 1
20 tag3 1
...
18 tag1 1

Why am I getting 18 tag1 1 twice instead of 18 tag1 2?


Solution

  • .annotate() is not at all equivalent to GROUP BY.
    Generally using Django's ORM to translate SQL queries isn't a good idea. It is meant to help you solve a problem by translating your idea's into queries.
    It would not make any sens to get TaggedItem instances as results of an SQL queries containing a GROUP BY statement since those would not represent a row of the TaggedItem table.

    To get a GROUP BY statement, you need to get ride of the model layer by forcing the results to be dict.

    qs = TaggedItem.objects.all()
    qs = qs.values("tag_id")  # Any field of the TaggedItem table you want as entries in the result dict
    qs = qs.annotate(freq=Count("tag_id"))  # This, combined with the above line, will produce a group by tag_id
    qs = qs.order_by("-freq")