Search code examples
pythondjangoannotations

Django: annotate query of one model with count of different non-related model which is filtered by a field of first model


Long title in short : I have a complex annotate query to work with. Example Models:

class FirstModel(models.Model):
    master_tag = models.CharField()
    ... other fields

class SecondModel(models.Model):
    ref_name = models.CharField()

I want to fetch all objects from FirstModel with count of all objects from SecondModel if ref_name of this objects are same as master_tag of FirstModel object.

What i tried:

I tried using annotate with Subquery and OuterRef but can not get this to work as getting constant errors.

from django.db.models import OuterRef, Subquery, Count

sub_query = Subquery(
    SecondModel.objects.filter(ref_name=OuterRef("master_tag")).values_list("pk", flat=True)
)
FirstModel.objects.annotate(sm_count=Count(sub_query))

This gave me error : "django.db.utils.ProgrammingError: more than one row returned by a subquery used as an expression" I tried lots of other things one of which is putting ".count()" at the end of subquery but that causes another error as count tries to evaluate query eagerly and fails due to OuterRef.

So is there a way to fetch a query like this with the count annotation ? Any stupid mistakes i made in writing above query ?


Solution

  • I've been trying for a while to solve this with OuterRef and Subquery, but the main issue is that Subquery expressions are designed to return a single value, when in this case you need to return many.

    Django doesn't seem to directly support annotating a queryset with the count of unrelated model instances based on a field comparison.

    I managed to achieve what you want without Subquery, using RawSQL instead:

    from django.db.models.expressions import RawSQL
    
    # Annotate FirstModel queryset with RawSQL
    qs = FirstModel.objects.annotate(
        sm_count=RawSQL(
            """
            SELECT COUNT(*) FROM yourapp_secondmodel 
            WHERE yourapp_secondmodel.ref_name = yourapp_firstmodel.master_tag
            """,
            []
        )
    )
    
    # Check the sm_count values
    qs.values('master_tag','sm_count').order_by('-sm_count')
    

    Another option would be to iterate over all instances of FirstModel counting the times of ref_name appearing, but that is so inefficient that it would only be feasible in small datasets...

    In conclusion: this would be much easier if there was a foreign key between these models.

    I'd be glad to be proven wrong if someone is able to make it work with Subquery, but in the meantime the RawSQL may solve the issue for you and it will still return a queryset.



    Well scratch what I said before, I believe I got it to work, thanks to this answer:

    This approach uses the Count inside the SubQuery but also uses Coalesce to ensure a value when the Subquery returns None:

    from django.db.models import Count, OuterRef, Subquery, IntegerField
    from django.db.models.functions import Coalesce
    
    sub_query = SecondModel.objects.filter(ref_name=OuterRef('master_tag')).values('ref_name').annotate(count=Count('*')).values('count')
    
    # Apply the Subquery, similarly to the original approach
    # Use distinct() to avoid duplicates if master_tag exists multiple times in FirstModel
    qs = FirstModel.objects.annotate(sm_count=Coalesce(Subquery(sub_query,output_field=IntegerField()), 0)).distinct()
    qs.values('master_tag','sm_count').order_by('-sm_count')
    

    It is a bit more cumbersome but I think this will return the result you were looking for, and stop the Cardinality errors you are getting in the Subquery.