Search code examples
djangodjango-modelsdjango-orm

Annotating a Django QuerySet with the count of a Subquery


I'm building a job board. Each Job could have several associated Location objects.

I have designed my Location and Job models as follows:

class Location(BaseModel):
    slug = models.CharField(unique=True)
    city = models.OneToOneField(to="City", null=True)
    state = models.ForeignKey(to="State", null=True)
    country = models.ForeignKey(to="Country")


class Job(BaseModel):
    title = models.CharField()
    description = models.TextField()
    locations = models.ManyToManyField(
        to="Location",
        related_name="jobs",
        through="JobLocation",
    )

So, a Location object has the flexibility to refer to our idea of a country (like United States), a state (like New York) or a city (like Manhattan). I populate the slug field of Location model like so:

  • If Location object is a country, I use the country name.

    • united-states
  • If Location object is a state, I use the (state name + country name).

    • new-york-united-states
  • If Location object is a city, I use (city name + state name + country name).

    • manhattan-new-york-united-states

With slug field populated in this manner, I can simplify querying all the jobs in a particular location using the endswith lookup. For example, if there's a Python job in Manhattan, NY and a React job in Brooklyn, NY, I can get all the jobs in the state of New York like so:

Job.objects.filter(locations__slug__endswith="new-york-united-states").distinct()

Now, I would like to get a list of all my Location objects, and have each Location item annotated with number_of_jobs. This number_of_jobs should be the count of all jobs within that particular Location.

For example:

  • Manhattan Location object should have number_of_jobs=1 (The Python job)
  • Brooklyn Location object should have number_of_jobs=1 (The React job)
  • New York Location object should have number_of_jobs=2 (Python job + React job)

Attempt 1:

The simplest solution, which is to Count the Location.jobs doesn't work. It shows 1 job for Manhattan and 1 job for Brooklyn, but 0 jobs for New York.

Location.objects.annotate(number_of_jobs=Count("jobs", distinct=True))
# 1 job for Manhattan
# 1 job for Brooklyn
# 0 job for New York 

Attempt 2:

subquery = Job.objects.filter(locations__slug__endswith=OuterRef("slug")).distinct().count()

Location.objects.annotate(number_of_jobs=Subquery(subquery))

This also doesn't work, because count() executes the queryset immediately.

Attempt 3:

Based on Yuvraj's response, I managed to get absolutely close to the answer, but not quite.

jobs = Job.objects.filter(locations__slug__endswith=OuterRef("slug"))

subquery = jobs.annotate(job_count=Count("id", distinct=True)).values("job_count")[:1]

Location.objects.annotate(number_of_jobs=Subquery(subquery))

If I run these expressions using debugsqlshell of django-debug-toolbar, and I copy the SQL that Django creates, and remove the GROUP BY clause, I get my answer 100% correctly. I don't yet understand why Django adds GROUP BY and why removing it gives the correct answer.


I'm at my wits' end with this queryset. I'd appreciate any help.


Solution

  • First off, thank you Yuvraj for your response. This answer builds on top of yours.

    Turns out, Django automatically adds a GROUP BY clause when you use the Count aggregate function. I don't fully understand why it adds the GROUP BY clause, and why removing it returns the correct results in my case, so I won't speculate. But here's the answer, also based on Sergei's comment on this answer:

    class NonAggregateCount(Count):
        # Gets rid of Django's automatic GROUP BY clause
        contains_aggregate = False
    
    jobs = Job.objects.filter(locations__slug__endswith=OuterRef("slug"))
    
    subquery = jobs.annotate(job_count=NonAggregateCount("id", distinct=True)).values("job_count")
    
    Location.objects.annotate(number_of_jobs=Subquery(subquery))